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(57) Abstract 

A novel gene, DDMI, and its encoded protein are provided. The gene was isolated from a region of Arabidopsis thaliana chromosome 
5. DDMl appears to be part of the SWI2/SNF2 family of chromatin-remodeling proteins. Disruption of the gene results in DNA 
hypomethylation, among other phenotypes. The DDMl gene defines a novel member of the DNA methylation system. Methods of using 
DDMl , and transgenic organisms comprising DDMl, are also provided. 
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PLANT GENE THAT REGULATES DNA METHYLATION 

Pursuant to 35 U.S.C. §202 (c) , it is 
acknowledged that the U.S. Government has certain rights 
in the invention described herein, which was made in part 
with funds from the National Science Foundation, Grant 
Nos. MCB9306266 and BIR9256779. 

This application claims priority to U.S. 

Provisional Application Serial No. 60/ , filed 

April 30, 1998, and to U.S. Application No. 09/104,070, 
filed June 24, 1998 the entireties of which are 
incorporated by reference herein. 

FIELD OF THE INVENTION 

This invention relates to the field of plant 
molecular biology, genetic engineering and regulation of 
gene expression. In particular, this invention provides 
a novel gene, DDMl , which plays an important role in the 
regulation of DNA methylation, and resultant regulation 
of gene expression, in plant genomic DNA. 

BACKGROUND OF THE INVENTION 

Various publications or patents are cited in 
this application to describe the state of the art to 
which the invention pertains. Each of these publications 
or patents is incorporated by reference herein. 

Plant genomes contain substantial amounts of 5- 
methylcytosine . Up to 20-30% of the cytosines are 
methylated in the nuclear genome of many flowering 
plants. As in other organisms, methylation of cytosine 
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residues in plants occurs po'sc-replicatively through the. 
action of cytosine-DNA methyltransf erases . Plant DNA 
methyl transferases have been characterized biochemically, 
and plant genes encoding these enzymes have been isolated 
5 by virtue of their similarity to their mammalian 
counterparts . 

Investigations of native plant genes and 
transgenic plants containing foreign genes have found a 
general correlation between transcriptional inactivity 
10 and increased DNA methylation, consistent with evidence 
from mammalian systems. This evidence supports a role 
for cytosine methylation in maintaining transcriptional 
states. 

The plant's need for developmental plasticity 

15 and environmental interaction suggests that plants 

extensively employ epigenetic regulatory strategies. 
Such strategies rely on heritable, often reversible, 
changes in access to the underlying genetic information, 
but not alteration of the primary nucleotide sequence. 

2 0 As one example, the alteration of DNA methylation is 
expected to perturb plant development significantly, 
provided that differential DNA methylation is an 
important component of epigenetic regulation in plants. 
One paradigm linking DNA methylation and 

2 5 developmental regulation comes from work on the mouse, 
where average genome cytosine methylation levels in 
embryonic lineages drop sharply in the early cleavages 
following fertilization, then rise again around the time 
of implantation. In plants, a similar pattern has been 

30 observed in studies of DNA methylation content in pollen 
and post -embryonic tissue of varying age. Information 
from such studies indicates that there is a gradual rise 
in 5-methylcytosine levels in post -embryonic tissues 
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produced by meristems at positions further from the base, 
of the plant (i.e., tissues of increasing age). Genetic 
studies of transposon systems in maize also demonstrate 
an age-dependent gradient of increasing epigenetic 
modification, which is correlated with DNA methylation. 

Both biochemical and genetic approaches have 
been taken to alter DNA methylation in eucaryotic 
organisms. Methylation inhibitor treatments have induced 
developmental abnormalities in many plant species. 
Transgenic plants expressing antisense molecules specific 
for a native cytosine methyl transferase gene have been 
found to exhibit genomic hypomethylation, presumably due 
to the antisense interference with expression of the 
gene . 

In another approach, mutants of Arahidopsis 
thallana have been isolated, which show a decrease in DNA 
methylation (ddm) resulting in reduced nuclear 5- 
methylcytosine levels. The best characterized mutations 
define the DDMl gene. Homozygotes carrying recessive ' 
ddml alleles contain 30% of the wild-type levels of 5- 
methylcytosine . The ddml mutations do not map to the two 
known cytosine-DNA methyl transferase genes of A. 
thaliana, nor do they affect DNA methyltransf erase 
activity detectable in nuclear extracts (Kakutani et al . , 
Nuc. Acids Res. 23: 130-137, 1995) . In addition, ddml 
mutations do not appear to affect the metabolism of the 
active methyl group donor, S-adenosylmethionine (Kakutani 
et al . , 1995 , supra) . 

For the foregoing reasons, the DDMl gene 
product is likely to be a novel component of the DNA 
methylation system, or involved in determining the 
cellular context (e.g., chromatin structure, subnuclear 
localization) of the methylation reaction. Consequently, 
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it would be a clear advance in the art of plant molecular 
and cellular biology to identify and isolate the DDMl 
gene and/or its encoded protein. Such a gene and protein 
would find utility for the purpose of modifying the 
5 methylation status of a selected genome and thereby 
altering one or more regulatory features of gene 
expression from that genome. 



SUMMARY OF THE INVENTION 

10 A novel gene, DDMl, and its encoded protein are 

provided in accordance with the present invention. The 
gene has been identified as a novel element of the DNA 
methylation system . 

In one aspect of .the invention, an isolated 

15 nucleic acid molecule comprising a gene located on 
Arabidapsis thaliana chromosome 5, lower arm, is 
provided. The gene occupies a segment of chromosome 5, 
lower arm, which is flanked on the . centromeric side 
within 20 kilobases by a gene encoding a zinc- finger 

20 protein and on the telomeric side within 1 kilobase by a 
gene encoding a glutamic . acid tRNA. Disruption of the 
gene is associated with DNA hypomethylat ion . The gene 
encodes a polypeptide of about 764 amino acids in length. 
The nucleotide sequence of the DDMl gene is set forth 

2 5 herein as SEQ ID NO : 1 and its deduced amino acid sequence 
as SEQ ID NO: 2. In SEQ ID NO : 1 , the regions of the gene 
that comprise coding sequence are indicated. 

In another aspect of the invention, an isolated 
DDMl gene is provided, having a sequence selected from 

30 the group consisting of: (a) SEQ ID NO : 1 ; (b) an allelic 
variant or natural mutant of SEQ ID NO : 1 ; (c) a sequence 
hybridizing with part or all of SEQ ID NO:l or its 
complement and encoding a polypeptide substantially the 
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same as part or all of a polypeptide encoded by SEQ ID 
NO : 1 ; (d) a sequence encoding part or all of a 
polypeptide having amino acid SEQ ID NO : 2 ; and (e) a 
sequence encoding part or all of a polypeptide contained 
5 in the cosmid clone C38, designated ATCC Accession No. 
207208 . 

According to another aspect of the invention, a 
polypeptide is provided, which is produced by expression 
of an isolated nucleic acid molecule comprising part or 

10 all of an open reading frame of a gene located on 

Arabidopsis thaliana chromosome 5, lower arm, the gene 
occupying a segment of chromosome 5, lower arm, flanked 
on the centromeric side within 20 kilobases by a gene 
encoding a zinc-finger protein and on the telomeric side 

15 within 1 kilobase by a gene encoding a glutamic acid 
tRNA. This polypeptide preferably has the amino acid 
sequence of part or all of SEQ ID NO : 2 . 

According to another aspect of the invention, 
an isolated protein encoded by an Arabidopsis thaliana 

20 gene is provided, which is a member of an SWI2/SNF2 

family of polypeptides. Loss of function of the protein 
is associated with DNA hypomethylat ion . The protein is 
encoded by a gene located on A. thaliana chromosome 5, 
lower arm, centromerically flanked within 20 kilobases by 

25 a zinc finger-encoding gene and telomerically within one 
kilobase by a gene encoding a glutamic acid tRNA. 

According to another aspect of the invention, a 
transgenic organism comprising the DDMl gene is provided. 
In one embodiment, the transgenic organism is a plant. 

30 In other aspects of the invention, methods are 

provided for stabilizing fidelity of DNA methylation in 
an organism, which comprise transforming the organism 
with the DDMl gene. Methods are also provided for 
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reducing or eliminating gene silencing in a plant, or for 
inducing inbreeding depression in a plant, which comprise 
inhibiting or preventing expression of an endogenous DDMl 
gene of the plant. 
5 These aspects of the invention, as well as 

other features and advantages of the invention, will be 
described in greater detail in the description and 
examples set forth below. 



10 BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1. Map-based isolation of the A, 
. thaliana DDMl gene. A genetic map of the region of A. 
thaliana chromosome 5 containing the DDMl gene is shown 
at the top of the figure (see Example 1) . The relative 

15 sizes of the genetic intervals were determined by the 

number of recombination breakpoints (rec bkpts) scored in 
a panel of recombinant lines containing cross-overs 
between flanking markers yi and ajba . The regions 
represented in genomic clones T10D21 and C38 are denoted 

2 0 by the open boxes below the genetic map. The -3 0 kb 

interval containing the DDMl gene, defined by the genetic 
markers A and D, is shown at the bottom of the figure. 
The number of recombination breakpoints scored between 
markers A - D and ddinl'2 are indicated.. The position of 

2 5 predicted coding regions in the interval are numbered and 
shown below the physical map. BAC, bacterial artificial 
chromosome; SuDH, succinate dehydrogenase structural 
gene . 

Figure 2 . DDMl gene structure and 
30 identification. Fig. 2A: The intron/exon structure of 
the DDMl gene. Protein-coding exons are shown as open 
boxes, with the start and stop codons indicated. Introns 
are depicted as thin lines. The position and nature of 
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four ddml alleles are indicated above the exon/intron 
map. Fig. 2B: RT-PCR analysis of ddml -2 and wild-type 
DDMl cranscripcs . The approximate positions of 
oligonucleotide primers used in the analysis are shown 
5 below nhe map in Fig. 2A. Amplifications were done on 
either genomic templates (DNA) , first-strand cDNA 
templates (+RT, plus reverse transcriptase) , or mock- 
synthesized cDNA (-RT, minus reverse transcriptase) . 
Amplified products were separated on a 3% agarose gel and 

10 visualized after ethidium bromide staining. 

Amplification from cDNA representing the properly spliced 
transcript resulted in a -280 bp product. The nucleotide 
sequence of the -220 bp product amplified from ddinl~2 
cDNA template indicated that the mutation leads to use of 

15 an alternate splice donor 56 bp upstream of the wild-type 
splice donor site. 

Figure 3. The A. thaliana. DDMl gene encodes a 
SWI2/SNF2-like protein. The deduced primary amino acid 
sequence of DDMl (At DDMl) is aligned with two- other 

20 SWI2/SNF2 -like protein sequences, Mus musculus lymphocyte 
specific helicase (Mm LSH; SEQ ID NO : 4 ) and human SNF2h 
(Hs SNF2h; SEQ ID N0:5). Sequence identities are 
indicated by black boxes and conservative changes are 
shaded. The positions of the eight signature motifs 

25 characteristic of SNF2 family proteins are indicated 

below the aligned sequences. Amino acid coordinates are 
indicated on the left; only the N-terminal 730 amino 
acids (of 1052 total) are shown for human SNF2h, though 
SEQ ID NO: 5 shows the entire protein sequence. The 

30 deletion/f rameshif t caused by the ddml -2 allele occurs at 
amino acid 524 The ddml-6 frameshift occurs at amino 
acid 379, leading to translation of an additional 52 
amino acids out of frame. The ddml-? nonsense mutation 
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occurs at amino acid 549. Dashes indicate gaps 
introduced by the CLUSTAL W algorithm to maximize 
alignment (Thompson et al . , Nucleic Acids Res. 22: 4673- 
4680, 1994). The alignment was processed by BOXSHADE v. 
5 3.21. 



DETAILED DESCRIPTION OF THE INVENTION 
I . Definitions 

Various terms relating to the biological 

10 molecules of the present invention are used throughout 
the specification and claims. 

With reference to nucleic acids of the 
invention, the term "isolated nucleic acid" is sometimes 
used. This term, when applied to DNA, refers to a DNA 

15 molecule that is separated from sequences with which it 
is immediately contiguous (in the 5' and 3' directions) 
in the naturally occurring genome of the organism from 
which it was derived. For example, the "isolated nucleic 
acid" may comprise a DNA molecule inserted into a vector, 

20 such as a plasmid or virus vector, or integrated into the 
genomic DNA of a procaryote or eucaryote . An "isolated 
nucleic acid molecule" may also comprise a cDNA molecule. 
With respect to RNA molecules of the invention 
the term "isolated nucleic acid" primarily refers to an 

2 5 RNA molecule encoded by an isolated DNA molecule as 

defined above. Alternatively, the term may refer to an 
RNA molecule that has been sufficiently separated from 
RNA molecules with which it would be associated in its 
natural state (i.e., in cells or tissues), such that it 

30 exists in a "substantially pure" form (the term 
"substantially pure" is defined below) . 

With respect to protein, the term "isolated 
protein" or "isolated and purified protein" is sometimes 
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used herein. This cerm refers primarily to a protein 
produced by expression of an isolated nucleic acid 
molecule of the invention. Alternatively, this term may 
refer co a protein which has been sufficiently separated 
5 from, other proteins with which it would naturally be 

associated, so as to exist in "substantially pure" form. 

The term "substantially pure" refers to a 
preparation comprising at least 50-60% by weight the 
compound of interest (e.g., nucleic acid, 

10 oligonucleotide, protein, etc.). More preferably, the 
preparation comprises at least 75% by weight, and most 
preferably 90-99% by weight , the .compound of interest. 
Purity is measured by methods appropriate for the 
compound of interest (e.g. chromatographic methods, 

15 agarose or polyacrylamide gel electrophoresis, HPLC 
analysis, and the like) . 

Nucleic acid sequences and amino acid sequences 
can be compared using computer programs that align the 
similar sequences of the nucleic or amino acids thus 

20 define the differences. In the comparisons made in the 
present invention, the CLUSTLW program and parameters 
employed therein were utilized (Thompson et al . , 1994, 
supra) . However, equivalent alignments and 
similarity/identity assessments can be obtained through 

25 the use of any standard alignment software. For 
instance, the GCG Wisconsin Package version 9.1, 
available from the Genetics Computer Group in Madison, 
Wisconsin, and the default parameters used (gap creation 
penalty=12, gap extension penalty=4) by that program may 

30 also be used to compare sequence identity and similarity. 

The term "substantially the same" refers to 
nucleic acid or amino acid sequences having sequence 
variation that do not materially affect the nature of the 
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protein (i.e. the structure, stability characteristics, 
substrate specificity and/or biological activity of the 
protein) . With particular reference to nucleic acid 
sequences, the term "substantially the same" is intended 
5 to refer to the coding region and to conserved sequences 
governing expression, and refers primarily to degenerate 
codons encoding the same amino acid, or alternate codons 
encoding conservative substitute amino acids in the 
encoded polypeptide. With reference to amino acid 

10 sequences, the term "substantially the same" refers 

generally to conservative substitutions and/or variations 
in regions- of the polypeptide not involved in 
determination of structure or function. 

The terms "percent identical" and "percent 

15 similar" are also used herein in comparisons among amino 
acid and nucleic acid sequences. When referring to amino 
acid sequences, ''percent identical" refers to the percent 
of the amino acids of the subject amino acid sequence 
that have been matched to identical amino acids in the 

2 0 compared amino acid sequence by a sequence analysis 

program. "Percent similar" refers to the percent of the 
amino acids of the subject amino acid sequence that have 
been matched to identical or conserved amino acids. 
Conserved amino acids are those which differ in structure 
25 but are similar in physical properties such that the 

exchange of one for another would not appreciably change 
the tertiary structure of the resulting protein. 
Conservative substitutions are defined in Taylor (1986, 
J. Theor. Biol. 119:205). When referring to nucleic acid 

3 0 molecules, "percent identical" refers to the percent of 

the nucleotides of the subject nucleic acid sequence that 
have been matched to identical nucleotides by a sequence 
analysis program. 
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With respect to antibodies, the term 
"immunologically specific" refers to antibodies that bind 
to one or more epitopes of a protein of interest, but 
which do not substantially recognize and bind other 
5 molecules in a sample containing a mixed population of 
antigenic biological molecules. 

With respect to oligonucleotides or other 
single-stranded nucleic acid molecules, the term 
"specif ically hybridizing" refers to the association 

10 between two single-stranded nucleic acid molecules of 
sufficiently complementary sequence to permit such 
hybridization under pre -determined conditions generally 
used in the art (sometimes termed "substantially 
complementary"). In particular, the term refers to 

15 hybridization of an oligonucleotide with a substantially 
complementary sequence contained within a single- stranded 
DNA or RNA molecule, to the substantial exclusion of 
hybridization of the oligonucleotide with single- stranded 
nucleic acids of non- complementary sequence. 

2 0 A ''coding sequence" or ''coding region" refers 

to a nucleic acid molecule having sequence information 
necessary to produce a gene product, when the sequence is 
expressed . 

The term "operably linked" or "operably 

2 5 inserted" means that -the regulatory sequences necessary 

for expression of the coding sequence are placed in a 
nucleic acid molecule in the appropriate positions 
relative to the coding sequence so as to enable 
expression of the coding sequence. This same definition 

3 0 is sometimes applied to the arrangement other 

transcription control elements (e.g. enhancers) in an 
expression vector . 

Transcriptional and translat ional control 
sequences are DNA regulatory sequences, such as 
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promoters, enhancers, polyadenylat ion signals, 
terminators, and the like, that provide for the 
expression of a coding sequence in a host cell. In 
particular, as used herein, the term "DNA transcriptional 
5 response element" refers to a DNA sequence specifically 
recognized for binding by a DNA binding protein 
characterized as a transcriptional regulator (either 
activator or suppressor) . 

The terms ''promoter" , "promoter region" or 

10 "promoter sequence" refer generally to transcriptional 
regulatory regions of a gene, which may be found at the 
5' or 3 ' side of the coding region, or within the coding 
region, or within introns. Typically, a promoter is a 
DNA regulatory region capable of binding RNA polymerase 

15 in a cell and initiating transcription of a downstream 
(3' direction) coding sequence. The typical 5' promoter 
sequence is bounded at its 3 ' terminus by the 
transcription initiation site and extends upstream (5' 
direction) to include the minimum number of bases or 

20 elements necessary to initiate transcription at levels 
detectable above background. Within the promoter 
sequence is a transcription initiation site (conveniently 
defined by mapping with nuclease SI) , as well as protein 
binding domains (consensus sequences) responsible for the 

25 binding of RNA polymerase. 

A "vector" is a replicon, such as plasmid, 
phage, cosmid, or virus to which another nucleic acid 
segment may be operably inserted so as to bring about the 
replication or expression of the segment. 

3 0 The term "nucleic acid construct" or "DNA 

construct" is sometimes used to refer to a coding 
sequence or sequences operably linked to appropriate 
regulatory sequences and inserted into a vector for 
transforming a cell. This term may be used 

35 interchangeably with the term "transforming DNA". Such a 
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nucleic acid construct may contain a coding sequence for. 
a gene product of interest, along with a selectable 
marker gene and/or a reporter gene. 

The term "reporter gene" refers to genetic 
5 sequences which may be operably linked to a promoter 

region forming a transgene, such that expression of the 
reporter gene coding region is regulated by the promoter 
and expression of the transgene is readily assayed. 

The term "selectable marker gene" refers to a 

10 gene product that when expressed confers a selectable 
phenotype, such as antibiotic resistance, on a 
transformed cell or plant. 

The term "DNA construct" is sometimes used 
herein to refer to genetic sequence used to transform 

15 plants and generate progeny transgenic plants. These 
constructs may be administered to plants in a viral or 
plasmid vector. Other methods of delivery such as 
Agrobacterium T-DNA mediated transformation and 
transformation using the biolistic process are also 

20 contemplated to be within the scope of the present 
invention. The transforming DNA may be prepared 
according to standard protocols such as those set forth 
in "Current Protocols in Molecular Biology", eds . 
Frederick M. Ausubel et al , , John Wiley Sc Sons, 1995. 

25 A cell has been "transformed" or " transf ected" 

by exogenous or heterologous DNA construct when such DNA 
has been introduced inside the cell. The transforming 
DNA may or may not be integrated (covalently linked) into 
the genome of the cell. In prokaryotes, yeast, and plant 

30 cells for example, the transforming DNA may be maintained 
on an episomal element such as a plasmid. With respect to 
eukaryotic cells, a stably transformed cell is one in 
which the transforming DNA has become integrated into a 
chromosome so that it is inherited by daughter cells 
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through chromosome replication. This stability is 
demonstrated by the ability of the eukaryotic cell to 
establish cell lines or clones comprised of a population 
of daughter cells containing the transforming DNA . A 
"clone" is a population of cells derived from a single 
cell or common ancestor by mitosis. A "cell line" is a 
clone of a primary cell that is capable of stable growth 
in vitro for many generations. 

II. Description of DDMl 

and its Encoded Polveptide 

In accordance with the present invention, a 
novel gene, DDMl, has been isolated from the flowering 
plant Arabidopsis thaliana . Through analysis of mutant 
plants, this gene has been identified as important for 
the maintenance of proper genomic cytosine methylation, 
and its function appears to be necessary to maintain gene 
silencing. Biochemical and molecular genetic results 
indicate that DDMl encodes a novel component of the DNA 
methylation machinery . 

We have isolated the DDMl gene from A, thaliana 
using a map-based cloning approach, which is described in 
detail in Example 1 and shown in Figure 1. Briefly, the 
DDMl gene was initially localized to the bottom of the 
lower arm of chromosome 5 by reference to molecular 
markers segregating in an F2 family (parental cross: 
Columbia ddml/ddml X Landsberg erecta DDMl/DDMl) . Next, 
recombination breakpoints in the region surrounding a 
ddml mutation were isolated by collecting cross-over 
chromosomes by reference to flanking genetic markers. 
The recombination breakpoints delimited a region of 
approximately 30 kilobases. Cloned DNA corresponding to 
this genomic region was isolated by subcloning DNA from a 
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bacterial artificial chromosome (BAG) containing 
molecular markers mapping both proximal and distal to the 
ddml marker. The nucleotide sequence of a single cosmid 
subclone encompassing the 3 0 kb region was determined to 
5 identify six candidate genes, in addition to a tRNA gene 
and a previously identified succinate dehydrogenase 
structural gene. 

The search for the DDMl gene focused on 
predicted genes 5 and 6, which fell in the center of the 

10 genetic interval defined by recombination breakpoints 

with the ddinl-2 marker. The DDMl gene was identified as 
predicted gene 6 based on DNA sequence alterations in 
four ddml alleles (Figure 2) . The EMS -generated ddinl-2 
mutation is a G to A transition in the splice donor site 

15 of intron 11 that forces the use of an alternate splice 
donor site 56 bp upstream in exon 11 (Fig. 2B) . The 
splicing defect leads to a deletion, a frameshift and 
premature translation termination upstream of predicted 
functional domains. The fast neutron-generated ddxnl-5 

20 (previously named som8 ; Mittelsten Scheid, O., Afsar, K. 

Sc Paszkowski, J. Proc , Natl. Acad. Sci. USA 95: 632-637, 
1998).) allele contains an 82 bp insertion (1 bp deleted 
and replaced with 83 bp) in the second protein-coding 
exon, leading to an in-frame stop after 30 codons (15 

25 wild-type codons plus 15 codons from the insertion) . 

Premature translation termination is also predicted to 
result from two additional fast neutron alleles: ddml-S 
isorn4) corresponds to a frameshift (1 bp deletion) in 
exon 7 and ddml-7 (somS) is a nonsense mutation in exon 

30 12. All four characterized ddml alleles are expected to 
destroy or severely reduce gene function. 

The wild- type DDMl gene encodes a predicted 
protein of 764 amino acids with a high degree of 



BNSDOCIO: <WO 9955e91A1J_> 



wo 99/55891 PCT/US99/09268 

- 16 - 

similarity to SWI2/SNF2 - like proteins. Members of the 
SWI2/SNF2 family are involved in various functions, 
including transcriptional co- activation, transcriptional 
co-repression, chromatin assembly and DNA repair. 
5 Underlying these apparently diverse activities is the 

modification or disruption of protein-DNA interactions by 
multi -protein complexes which contain SWI2/SNF2 -like 
components. Figure 3 shows an alignment among the 
deduced amino acid sequences of A, thaliana DDMl and two 

10 mammalian members of the SNF2 family, human SNF2h (SEQ ID 
NO: 4; Arihara , T. et al . , Cytogrenet. Cell Genet. 81, 
191-193, 1998) and murine LSH (SEQ ID NO : 5 ; lymphocyte 
specific helicase, LSH; Jarvis, CD. et al . Gene 169, 
203-207, 1996) . DDMl contains the eight sequence motifs 

15 diagnostic of SWI2/SNF2 family members (Bork, P. & 

Koonin, E.V. Nucleic Acids Res, 21, 751-752, 1993). A. 
thaliana DDMl and human SNF2h share 45 percent identity 
over the approximately 470 amino acid region comprising 
the signature motifs. Over a similar region, A. thaliana 

2 0 DDMl and murine LSH display approximately 5 0 percent 

identity, omitting the 47 residues (amino acids 276-322) 
apparently unique to LSH. Initial molecular phylogenetic 
analysis placed DDMl in a small subfamily, within the 
SNF2 family, which contains proteins of unknown function, 
25 including murine LSH (Eisen, J. A. et al . Nucleic Acids 

Res. 23, 2715-2723, 1995). The proteins of known function 
most closely related to DDMl are involved in chromatin 
remodeling and are grouped in the SNF2L/ISWI subfamily 
(Eisen et al . , 1995, supra), 

3 0 Without intending to be bound by any particular 

mechanism for the functionality of the DDMl gene product, 
analysis of the foregoing data indicates that the DDMl 
protein functions in the DNA methylation system by 
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affecting chromatin structure. Two general models for 
the DDMl action are envisioned. The DDMl protein may 
function as a transcriptional co-activator, similar to 
many SW12/SNF2-like proteins, to increase the expression 
5 of a component of the DNA methylation system. DDMl does 
not affect DNA methyl transferase expression directly 
because ddml mutant extracts contain wild-type 
methyl transferase activity (Kakutani et al . , 1995, 
supra) . However, an unidentified positive effector of 

10 DNA methylation may be a target. Alternatively, wild-type 
DDMl function may change chromatin structure to direct 
certain sequences to the methylation machinery or to 
facilitate the methylation of genomic substrates. The 
recently discovered interplay between cytosine 

15 methylation and histone acetylation , and the association 
of SW12/SNF2-like proteins and histone deacetylases in 
chromatin remodeling complexes, makes it plausible that 
DDMl affects DNA methylation through modulation of 
histone modification or another aspect of chromatin 

20 structure. Another possibility is that DDMl plays a more 
direct role as a part of a nucleosome remodeling complex 
that increases the accessibility of the DNA 
methyltransf erase to the hemimethylated substrates in 
newly replicated chromatin. The latter model is 

25 particularly attractive because it predicts that ddml 
mutations will preferentially hypomethylate genomic 
sequences packaged in highly condensed chromatin while 
causing slower loss of methylation in more accessible 
sequences, consistent with the observed hypomethylat ion 

30 specificity of ddml mutations. The isolation of the 
Arabidopsis DDMl gene in accordance with the present 
invention points to the importance of chromatin dynamics 
in the maintenance of cytosine methylation patterns and 
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identifies a novel component of the eukaryotic DNA 
methyl at ion pathway . 

A number of applications are contemplated for 
the novel gene of the invention and its encoded protein, 
5 and the discovery of the involvement of a SWI2/SNF2-li]<.e 
gene in the eucaryotic DNA methylation system. Such 
applications are described in greater detail below. 

Although the DDMl genomic clone from 
Arabidopsis thaliana is described and exemplified herein, 

10 this invention is intended to encompass nucleic acid 

sequences and proteins from other organisms, including 
plants, yeast, insects and mammals, that are sufficiently 
similar to be used instead of the Arabidopsis DDMl 
nucleic acid and proteins for the purposes described 

15 below. These include, but are not limited to, allelic 
variants and natural mutants of SEQ ID NO : 1 , which are 
likely to be found in different species of plants or 
varieties of Arabidopsis . Because such variants are 
expected to possess certain differences in nucleotide and 

2 0 amino acid sequence, this invention provides an isolated 

DDMl nucleic acid molecule having at least about 60% 
(preferably 70% and more preferably over 80%) sequence 
homology in the coding regions with the nucleotide 
sequence set forth as SEQ ID NO : 1 (and, most preferably, 
25 specifically comprising the coding region of SEQ ID 

N0:1). This invention also provides isolated polypeptide 
products of the open reading frames of SEQ ID NO : 1 , 
having at least about 60% (preferably 70% or 80% or 
greater) sequence homology with the amino acid sequences 

3 0 of SEQ ID NO: 2. Because of the natural sequence 

variation likely to exist among DDMl genes, one skilled 
in the art would expect to find up to about 3 0-40% 
nucleotide sequence variation, while still maintaining 
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the unique properties of the DDMl gene and encoded 
polypeptide of the present invention. Such an 
expectation is due in part to the degeneracy of the 
genetic code, as well as to the known evolutionary 
5 success of conservative amino acid sequence variations, 
which do not appreciably alter the nature of the encoded 
protein. Accordingly, such variants are considered 
substantially the same as one another and are included 
within the scope of the present invention. 

10 The following description sets forth the 

general procedures involved in practicing the present 
invention. To the extent that specific materials are 
mentioned, it is merely for purposes of illustration and 
is not intended to limit the invention. Unless otherwise 

15 specified, general cloning procedures, such as those set 
forth in Sambrook et al . , Molecular Cloning , Cold Spring 
Harbor Laboratory (1989) (hereinafter "Sambrook et al . " ) 
or Ausubel et al . (eds) Current Protocols in Molecular 
Biology, John Wiley Sc Sons (1999) (hereinafter "Ausubel 

2 0 et al . " ) are used. 

A. Preparation of DDMl Nucleic Acid 

Molecules, encoded Polypeptides and 
Antibodies Specific for the Polypeptides 

25 

1 . Nucleic Acid Molecules 

DDMl nucleic acid molecules of the invention 
may be prepared by two general methods: (1) they may be 

3 0 synthesized from appropriate nucleotide triphosphates, or 

(2) they may be isolated from biological sources. Both 
methods utilize protocols well known in the art. 

The availability of nucleotide sequence 
information, such as the cDNA having SEQ ID NO : 1 , enables 
3 5 preparation of an isolated nucleic acid molecule of the 
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invention by oligonucleotide synthesis. Synthetic 
oligonucleotides may be prepared by the phosphoramadite 
method employed in the Applied Biosystems 38A DNA 
Synthesizer or similar devices. The resultant construct 
5 may be purified according to methods known in the art, 
such as high performance liquid chromatography (HPLC) . 
Long, double- stranded polynucleotides, such as a DNA 
molecule of the present invention, must be synthesized in 
stages, due to the size limitations inherent in current 

10 oligonucleotide synthetic methods. Thus, for example, a 
long double-stranded molecule may be synthesized as 
several smaller segments of appropriate complementarity. 
Complementary segments thus produced may be annealed such 
that each segment possesses appropriate cohesive termini 

15 for attachment of an adjacent segment. ' Adjacent segments 
may be ligated by annealing cohesive termini in the 
presence of DNA ligase to construct an entire, long 
double-stranded molecule. A synthetic DNA molecule so 
constructed may then be cloned and amplified in an 

2 0 appropriate vector. 

DDMl genes also may be isolated from 
appropriate biological sources using methods known in the 
art. In the exemplary embodiment of the invention, the 
A. thaliana DDMl clone was isolated from a BAG genomic 
25 library of A. thaliana. In alternative embodiments, cDNA 
clones of DDMl may be isolated. A preferred means for 
isolating DDMl genes is PGR amplification using genomic 
templates and DDMl- specific primers. 

In accordance with the present invention, 

3 0 nucleic acids having the appropriate level sequence 

homology with part or all the coding regions of SEQ ID 
NO:l may be identified by using hybridization and washing 
conditions of appropriate stringency. For example, 
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hybridizations may be performed, according to the method . 
of Sambrook et al . , using a hybridization solution 
comprising: 5X SSC, 5X Denhardt ' s reagent, 1.0% SDS , 100 
fig/ml denatured, fragmented salmon sperm DNA, 0.05% 
sodium pyrophosphate and up to 50% formamide. 
Hybridization is carried out at 37-420G for at least six 
hours. Following hybridization, filters are washed as 
follows: (1) 5 minutes at room temperature in 2X SSC and 
1% SDS; (2) 15 minutes at room temperature in 2X SSC and 
0.1% SDS; (3) 30 minutes- 1 hour at 37oc in 2X SSC and 
0.1% SDS; (4) 2 hours at 45-55oin 2X SSC and 0.1% SDS, 
changing the solution every 30 minutes. 

One common formula for calculating the 
stringency conditions required to achieve hybridization 
between nucleic acid molecules of a specified sequence 
homology (Sambrook et al . , 1989): 

= 81.5°C + l6.6Log [Na+] + 0.41(% G+C) - 0.63 (% formamide) - 600/#bp in duplex 

As an illustration of the above formula, using [N+] = 
[0.368] and 50% formamide, with GC content of 42% and an 
average probe size of 200 bases, the T„ is 57°C. The 
of a DNA duplex decreases by 1 - 1.5°C with every 1% 
decrease in homology. Thus, targets with greater than 
about 75% sequence identity would be observed using a 
hybridization temperature of 42 °C. Such a sequence would 
be considered substantially homologous to the sequences 
of the present invention. 

Nucleic acids of the present invention may be 
maintained as DNA in any convenient cloning vector. In a 
preferred embodiment, clones are maintained in plasmid 
cloning/expression vector, such as pGEM-T (Promega 
Biotech, Madison, WI) or pBluescript (Stratagene, La 
Jolla, CA) , either of which is propagated in a suitable 
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E, coll host cell . 

DDMl nucleic acid molecules of the invention 
include cDNA, genomic DNA, RNA, and fragments thereof 
which may be single- or double-stranded. Thus ^ this 
5 invention provides oligonucleotides {sense or antisense 
strands of DNA or RNA) having sequences capable of 
hybridizing with at least one sequence of a nucleic acid 
molecule of the present invention, such as selected 
segments of the DNA having SEQ ID NO : 1 . Such 
10 oligonucleotides are useful as probes for detecting DDMl 
genes or mRNA in test samples, e.g. by PGR amplification, 
or for the positive or negative regulation of expression 
of DDMl genes at or before translation of the. mRNA into 
proteins . 

15 The DDMl promoter and other expression 

regulatory sequences for DDMl are also expected to be 
useful in connection with the present invention. SEQ ID 
NO : 1 shows about 550 bp of sequence upstream from the 
beginning of the coding region, which should contain such 

20 expression regulatory sequences. In addition, SEQ ID 
NO : 3 constitutes about 5 kbp of additional upstream 
sequence, which should contain other regulatory 
sequences, such as enhancer elements. 

2 5 2 . Proteins 

Polypeptides encoded by DDMl nucleic acids of 
the invention may be prepared in a variety of ways, 
according to known methods. If produced in situ the 
polypeptides may be purified from appropriate sources, 
30 e.g., plant parts. 

Alternatively, the availability of nucleic acid 
molecules encoding the polypeptides enables production of 
the proteins using in vitro expression methods known in 
the art. For example, a cDNA or gene may be cloned into 
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an appropriate in vitro transcription vector, such a 
PSPd4 or pS,P65 for in vitro transcription, followed by 
cell-free translation in a suitable cell-free translation 
system, such as wheat germ or rabbit reticulocytes. In 
5 vitro transcription and translation systems are 

commercially available, e.g., from Promega Biotech, 
Madison, V^isconsin or BRL, Rockville, Maryland. 

According to a preferred embodiment, larger 
quantities of DDMI- encoded polypeptide may be produced by 

10 expression in a suitable procaryotic or eucaryotic 

system. For example, part or all of a DNA molecule, such 
as the coding portion of SEQ ID NO : 1 , may be inserted 
into a plasmid vector adapted for expression in a 
bacterial cell (such as E, coli) or a yeast cell (such as 

15 Saccharomyces cerevisiae) , or into a baculovirus vector 
for expression in an insect cell. Such vectors comprise 
the regulatory elements necessary for expression of the 
DNA in the host cell, positioned in such a manner as to 
permit expression of the DNA in the host cell. Such 

20 regulatory elements required for expression include 

promoter sequences, transcription initiation sequences 
and, optionally, enhancer sequences. 

The DDMl polypeptide produced by gene 
expression in a recombinant procaryotic or eucyarotic 

2 5 system may be purified according to methods known in the 
art. In a preferred embodiment, a commercially available 
expression/secretion system can be used, whereby the 
recombinant protein is expressed and thereafter secreted 
from the host cell, to be easily purified from the 

30 surrounding medium. If expression/secretion vectors are 
not used, an alternative approach involves purifying the 
recombinant protein by affinity separation, such as by 
immunological interaction with antibodies that bind 
specifically to the recombinant protein. Such methods 

35 are commonly used by skilled practitioners. 
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The DDMI -encoded polypeptides of the invent ion> 
prepared by the aforementioned methods, may be analyzed 
according to standard procedures . Methods for analyzing 
the functional activity are available. For instance, DNA 
5 methylation levels are detectable by known methods.. 

Alternatively, the function of the DDMl gene product as 
part of a chromatin remodeling machine permits the use of 
in -vitro assays for chromatin remodeling, which are known 
in the art (e.g., B.R. Cairns, Trends in Bioahem. 23: 20- 
10 25, 1998) . 

The present invention also provides antibodies 
capable of immunospecif ically binding to polypeptides of 
the invention. Polyclonal or monoclonal antibodies 
directed toward the polypeptide encoded by DDMl may be 
15 prepared according to standard methods. Monoclonal 

antibodies may be prepared according to general methods 
of Kohler and Milstein, following standard protocols. In 
a preferred embodiment, antibodies are prepared, which 
react immunospecif ically with various epitopes of the 

2 0 DDMl - encoded polypept ides . 

B. Uses of DDMl Nucleic Acids, 

Encoded Proteins and Antibodies 

1 . DDMl Nucleic Acids 

25 DDMl nucleic acids may be used for a variety of 

purposes in accordance with the present invention. The 
DNA, RNA, or fragments thereof may be used as probes to 
detect the presence of and/or expression of DDMl genes. 
Methods in which DDMl nucleic acids may be utilized as 

3 0 probes for such assays include, but are not limited to: 

(1) in situ hybridization; (2) Southern hybridization (3) 
northern hybridization; and (4) assorted amplification 
reactions such as polymerase chain reactions (PGR) , 

The DDMl nucleic acids of the invention may 
35 also be utilized as probes to identify related genes from 
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other species, including but not limited to, plants, 
yeast, insects and mammals, including humans. As is well 
known in the art and described above, hybridization 
stringencies may be adjusted to allow hybridization of 
5 nucleic acid probes with complementary sequences of 

varying degrees of homology. Thus, DDMl nucleic acids 
may be used to advantage to identify and characterize 
other genes of varying degrees of relation to the 
exemplary coding sequence of SEQ ID NO : 1 , thereby 

10 enabling further characterization of this family of 

genes. Additionally, they may be used to identify genes 
encoding proteins that interact with protein encoded by 
DDMl (e.g., by the "interaction trap" technique). 

As discussed above and in greater detail in 

15 Example 1, the similarity among plant DDMl and its 

SWI2/SNF2 counterparts in yeast, Drosophila and mammals 
indicates that the functional aspects of these proteins 
will also be conserved. Thus, DDMl is expected to play 
an important role in DNA methylation and resultant down- 

20 regulation of gene expression. Plants engineered to 
over -express DDMl can be expected to have improved 
fidelity of the DNA methylation system. The evidence 
suggests that loss of DDMl function leads to reduction in 
the efficiency of maintenance methylation due to reduced 

25 accessibility of the methyl transferase enzyme to the 

substrate. Hence, excess DDMl function could lead to an 
increase in the fidelity of the inheritance of DNA 
methylation thereby reducing the occurrence of spurious 
methylation mistakes which could compromise the 

3 0 organism's viability or fecundity. In fact, there are 
experimental data demonstrating that loss of DDMl 
function leads to stochastic hypermethylat ion, and 
epigenetic lesion formation, as well. For these reasons, 
DDMl overexpression lines are expected to have useful 

35 properties. 
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Transgenic plants expressing the DDMl gene or . 
antisense nucleotides can be generated using standard 
plant transformation methods known to those skilled in 
the art. These include, but are not limited to, 
5 Agrohcicterium vectors, PEG treatment of protoplasts, 

biolistic DNA delivery, UV laser microbeam, gemini virus 
vectors, calcium phosphate treatment of protoplasts, 
elect roporat ion of isolated protoplasts, agitation of 
cell suspensions with microbeads coated with the 

10 transforming DNA, direct DNA uptake, 1 iposome-mediated 
DNA uptake, and the like. Such methods have been 
published in the art. See, e.g.. Methods for Plant 
Molecular Biology (Weissbach Sc Weissbach, eds . , 1988); 
Methods in Plant Molecular Biology (Schuler & Zielinski, 

15 eds., 1989); Plant Molecular BioloQV Manual (Gelvin, 

Schilperoort , Verma, eds., 1993); and Methods in Plant 
Molecular BioloQV - A Laboratory Manual (Maliga, Klessig, 
Cashmore, Gruissem & Varner, eds., 1994). 

The method of transformation depends upon the 

2 0 plant to be transformed. The biolistic DNA delivery 

method is useful for nuclear transformation. In another 
embodiment of the invention, AgrohactLBrium vectors are 
used to advantage for efficient transformation of plant 
nuclei . 

2 5 In a preferred embodiment, the gene is 

introduced into plant nuclei in Agroha.ct:eri\im binary 
vectors. Such vectors include, but are not limited to, 
BIN19 (Bevan, 1984) and derivatives thereof, the pBI 
vector series (Jefferson et al . , 1987), and binary 

30 vectors pGA482 and pGA492 (An, 1986) . 

The DDMl gene may be placed under a powerful 
constitutive promoter, such as the Cauliflower Mosaic 
Virus (CaMV) 35S promoter or the figwort mosaic virus 358 
promoter. Transgenic plants expressing the DDMl gene 

35 under an inducible promoter (either its own promoter or a 
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heterologous promoter) are also contemplated to be within 
the scope of the present invention. Inducible plant 
promoters include the tetracycline repressor/operator 
conrrolled promoter . 
5 Using an Agrohacterium binary vector system for 

transformation, the DDMl coding region, under control of 
a constitutive or inducible promoter as described above, 
is linked to a nuclear drug resistance marker, such as 
kanamycin resistance . Agr-obacter-iu/n- mediated 
10 transformation of plant nuclei is accomplished according 
to the following procedure: 

(1) the gene is inserted into the selected 
AgrobciCterium binary vector; 

(2) transformation is accomplished by co- 
15 cultivation of plant tissue (e.g., leaf discs) with a 

suspension of recombinant Agrohacterium, followed by 
incubation (e.g., two days) on growth medium in the 
absence of the drug used as the selective medium (see, 
e.g., Horsch et al . 1985); 

20 (3) plant tissue is then transferred onto the 

selective medium to identify transformed tissue; and 

(4) identified transf ormants are regenerated 
to intact plants. 

It should be recognized that the amount of 

25 expression, as well as the tissue specificity of 

expression of the DDMl gene in transformed plants can 
vary depending on the position of their insertion into 
the nuclear genome. Such position effects are well known 
in the art. For this reason, several nuclear 

3 0 transf ormants should be regenerated and tested for 
expression of the transgene . 

In some instances, it may be desirable to down- 
regulate or inhibit expression of endogenous DDMl in 
plants possessing the gene. One clear benefit to 

35 engineering a reduction of DDMl function is to reduce 
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gene (including transgene) silencing. Plant lines with 
reduced or absent DDMl function are expected to be viable 
based on results obtained with Arabidopsis . Further, it 
has been shown that gene silencing is suppressed in ddml 
5 Arahidopsis lines {Jeddeloh et al . , Genes De\rel , 12:1714- 
1725, 1998) . There are two other beneficial 
characteristics of DDMl deficient plant lines. First, 
alteration in DNA methylation leads to changes in 
flowering time, and as such, is a potentially powerful 

10 tool for manipulating plant development. (See, e.g., 

Richards, Trends in Genetics 13: 319-323, 1998), Second, 
ddml mutant lines exhibit inbreeding depression (a 
reduction in vigor after inbreeding) (Richards, Trends in 
Genetics, 1998, supra), a characteristic which may be 

15 desirable to include in situations where proprietary 

germplasms in hybrid plants are at risk of unauthorized 
use. For instance, a genetically engineered hybrid 
(containing one or more useful transgenes) could be 
further engineered to down- regulate endogenous DDMl 

20 expression. Unauthorized inbreeding of such lines would 
be discouraged because the progeny of such lines would 
lack vigor. 

To achieve the aforementioned benefits 
associated with reduced gene expression, DDMl nucleic 

2 5 acid molecules, or fragments thereof, may also be 
utilized to control the production of DDMI-encoded 
proteins. In one embodiment, full-length DDMl antisense 
molecules or antisense oligonucleotides, targeted to 
specific regions of DDMl -encoded RNA that are critical 

30 for translation, are used. The use of antisense 
molecules to decrease expression levels of a pre- 
determined gene is known in the art. In a preferred 
embodiment, antisense molecules are provided in situ by 
transforming plant cells with a DNA construct which, upon 
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transcription, produces the antisense sequences. Such 
constructs can be designed to produce full-length or 
P^r'tial antisense sequences. 

In another embodiment, overexpression of DDMl 
is induced to generate a co- suppression effect. This 
excess expression serves to promote down-regulation of 
both endogenous and exogenous DDMl genes. 

Optionally, transgenic plants can be created 
containing mutations in the region encoding the active 
site of DDMl. This embodiment may be preferred in 
certain instances . 

From the foregoing discussion, it can be seen 
that DDMl and its homologs will be useful for introducing 
alterations in gene expression in an organism, for a 
variety of purposes. As described above, for instance, 
the Ax-abidopsis DDMl gene can be used to isolate mutants 
or engineer organisms that express reduced function of 
DDMl orthologs. Based on results in Arabidopsis, such 
mutants or engineered organisms are expected to be viable 
and display valuable characteristics, such as inbreeding 
depression and a reduction in gene silencing. In 
addition, we anticipate that dysfunction in human DDMl 
orthologs may contribute to diseases that involve 
alterations in DNA methylation, including cancer (Baylin, 
S.B. et al . , Adv. Cancer Res , 72: 141-196, 1998) and 
immunodeficiency/ chromosome instability/facial anomalies 
syndrome (ICF) (Smeets, D.F.C.M. et al . , Hum. Genet. 94: 
240-246, 1994) . 

2 . DDMl Proteins and Antibodies 

Purified DDm -encoded proteins, or fragments 
thereof, may be used to produce polyclonal or monoclonal 
antibodies which also may seirve as sensitive detection 
reagents for the presence and accumulation ofDDMI- encoded 
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protein in cultured cells or tissues and in intact 
organisms. Recombinant techniques enable expression of 
fusion proteins containing part or all of the DDMl- 
encoded protein. The full length protein or fragments of 
the protein may be used to advantage to generate an array 
of monoclonal or polyclonal antibodies specific for 
various epitopes of the protein, thereby providing even 
greater sensitivity for detection of the protein in cells 
or tissue. 

DDMl gene products also may be useful as 
pharmaceutical agents if it is determined that DDMl loss 
of function plays a role in carcinogenesis, as mentioned 
above. The gene products could be administered as 
replacement therapy for persons having neoplasias 
associated with DDMl loss of function. 

Polyclonal or monoclonal antibodies 
immunologically specific for DDMl -encoded proteins may be 
used in a variety of assays designed to detect and 
quant itate the protein. Such assays include, but are not 
limited to: (1) flow cytometric analysis; (2) 
immunochemical localization in cultured cells or tissues; 
and (3) immunoblot analysis (e.g., dot blot, Western 
blot) of extracts from various cells and tissues. 

Polyclonal or monoclonal antibodies that 
immunospecif ically interact with the polypeptide encoded 
by DDMl can be utilized for identifying and purifying 
such proteins. For example, antibodies may be utilized 
for affinity separation of proteins with which they 
immunospecif ically interact. Antibodies may also be used 
to immunoprecipitate proteins from a sample containing a 
mixture of proteins and other biological molecules. 

The following specific examples are provided to 
illustrate embodiments of the invention. They are not 
intended to limit the scope of the invention in any way. 
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EXAMPLE 1 
Map-Based Isolation of the 
ArabidoDsis th allana DPMI Gene 

Construction of recombination breakpoint lines. 

The recombination breakpoint lines were assembled in the 
F3 generation from a parental cross between YI DDMl 
ABA/YI ddml-2 ABA (Columbia strain (Col)) and 
yi DDMl aba/yl DDMl aha (Landsberg erecta strain 
(La er) ) . The recessive yi mutation leads to a yellow 
inflorescence. The recessive aha mutation causes a defect 
in abscisic acid biosynthesis and a wilting phenotype . 
Information on genetic markers and the A. thaliana 
genetic map can be found at: http : //genome- 
www . Stanford . edu/Arabidopsis/ . Selfed seeds from Fl 
YI ddml-2 ABA/yi DDMl aba plants were collected and 13 5- 
F2 recombinants (yi ABA, yellow inflorescence, non- 
wilting; or YI aba: green inflorescence, wilting) were 
identified. Selfed seeds from 111 of the 135 recombinant 
F2 individuals were planted to generate F3 tissue for : 
genomic DNA preparation. The genotype at the DDMl locus 
was scored in the F3 generation by Southern blot analysis 
using methylation-sensitive endonucleases as described 
previously (Vongs, A., Kakutani , T., Martienssen, R.A. & 
Richards, E.J. , Science 260: 1926-1928, 1993). 

Molecular markers. Two of the molecular 
markers shown in Figure 1 were available from the 
Arabidopsis research community: g4 510 {Arabidopsis 
Biological Resource Center (ABRC) stock# CD2-38) and 
mi335 (ABRC stock# CD3-288). The remainder of the 
molecular markers shown in Figure 1 were developed in 
35 accordance with the present invention. sT10D2lBam is an 
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insert end subclone of the BAG (bacterial artificial 
chromosome) clone T10D21 constructed by complete cleavage 
with Ba/TTHI and recirculari zat ion , sT10D21Bam recognizes a 
Col/La er PstI RFLP (restriction fragment length 
5 polymorphism) . Molecular marker A is an Xbal Col/La er 
RFLP marker recognized by a 5 . 7 kb Hindlll fragment of 
the C38 cosmid insert. Marker B is a Rsal Col/La er CAPS 
marker (Koneiczny Sc Ausubel , Plant J. 4: 403-410, 1993) 
(forward primer: 5 ' -TCAAGGAGATGATTCGGGCGT- 3 ' , SEQ ID NO: 
10 6; reverse primer: 5 ' - AAAGGACCCATTTACAGAACAC- 3 ' , SEQ ID 
NO: 7). The remaining markers, C and D, correspond to 
RFLP's (Bell and PstI, respectively) recognized by the 
succinate dehydrogenase cDNA clone, 105N23T7 (ABRC stock# 
105N23T7) . 

15 Genomic library construction and screening. We 

screened the available A. thaliana BAG genomic libraries 
by standard colony hybridization techniques using 
radiolabeled 105N23T7 insert as a probe. The clone we 
subsequently focused upon, T10D21, came from the Texas 

20 AScM University BAG library (Choi et al , , Weeds World 2: 
17-20, 1995). To facilitate subsequent analysis, we 
cloned Sau3AI partially digested fragments from the 
T10D21 insert into the BairiHl site of SuperCos 
(Stratagene) . We chose to further characterize one 

25 member of the resulting cosmid sublibrary, C3 8, which 

contained genetic markers that flanked ddml-2. The C3 8 
cosmid was submitted on April 20, 1999, under the 
provisions of the Budapest Treaty, with the American Type 
Culture Collection (Manassas VA) , and assigned ATCC 

30 Accession No. 207208. 



BNSDOCID; <WO 9955891 A 1J_> 



wo 99/55891 



PCT/US99/09268 



33 



EXAMPLE 2 

DDMl Gene Structure and Identification; 
Sequenc e Determination of DPMI Gene 

DNA sequence determination. C3 8 cosmid (-4 5 kb) 
DNA, prepared using Qiagen columns and protocols, was 
sonicated and 1-2 kb fragments isolated from a low- 
melting temperature agarose gel. . The size-selected DNA 
was cloned into the Smal site of a M13mpl8 vector to 
generate a shotgun library suitable for DNA sequence 
determination. Single-stranded substrates were prepared 
and sequenced using conventional dye -terminator cycle 
sequencing protocols (Perkin-Elmer ) on either an ABI 373 
or ABI 3 77 automated DNA sequencer. The DNA sequence of 
the ddml alleles was determined using PCR-amplif ied 
templates and oligonucleotide primers dispersed 
throughout the DDMl gene. Sequence assembly and analysis 
were accomplished using Phred/Phrap/Consed 

(http://www.mbt.washington.edu/) and DNASTAR software 
suites . 

RT-PCR cDNA analysis. DDMl gene structure was 
determined by analysis of the genomic DNA sequence and 
the nucleotide sequence of RT-PCR (reverse transcript ion- 
polymerase chain reaction) products encompassing the 
coding region. DDMl and ddinl-2 transcripts were analyzed 
by RT-PCR as follows. Total RNA was prepared using the 
Qiagen RNeasy™ protocol. Poly (A) + transcripts were 
collected on oligo-d(T)25 magnetic Dynabeads (Dynal) and 
first -strand cDNA synthesis performed following Dynal 
protocols using Stratascript (Stratagene) reverse 
transcriptase. Aliquots of the bead- immobilized first - 
strand cDNA library were used as templates for PCR 
amplification using KlenTaqI polymerase (Clontech) . The 
following oligonucleotide primers were used for the RT- 
PCR experiment shown in Fig. 2b: forward, 
5 ' -GCTGGAAGGGAAAGCTTAACAACC- 3 ' ( SEQ ID NO : 8 ) ; reverse , 
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5 ' -ACACTGCCATCGATTCTGCAAACC-3 ' (SEQ ID NO: 9). 

GenBank accession numbers and SEQ ID NOS . 

Arabidopsis DDMl genomic DNA sequence: SEQ ID NO : 1 ; 
Arabidopsis DDMl deduced amino acid sequence: SEQ ID NO : 2 ; 
Arabidopsis DDMl 5' upstream genomic DNA sequence: SEQ ID 
NO : 3 ; 

Mus musculus lymphocyte specific helicase (LSH) ; Genbank 
Accession No. AAB08015; SEQ ID NO : 4 ; 

Homo sapiens SNF2h; Genbank Accession No. AB010882; SEQ 
ID N0:5; 

succinate dehydrogenase cDNA 105N23T7, T22529; 
primers: SEQ ID NOS: 6-9. 

While certain of the preferred embodiments of 
the present invention have been described and 
specifically exemplified above, it is not intended that 
the invention be limited to such embodiments. Various 
modifications may be made thereto without departing from 
the scope and spirit of the present invention, as set 
forth in the following claims. 
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SEQUENCE LISTING 

<110^ Eric J. Richards 

Jeffrey A. Jeddeloh 

<120> Plant Gene that Regulates DNA 
Me t hy lat ion 

<130> WashU CI-0014PCT 

<150> US 60/ 

<151:. 1998-04 -30 

<150> US 09/104,070 
<151> 1998-06-24 

<160> 9 

<170> FastSEQ for Windows Version 3.0 

<210> 1 
<211> 5000 
<212> DNA 

<213> Arabidopsis thaliana 

<220> 
<221> CDS 

<222> (535) . . . (566) 
<221> CDS 

<222> (772) . , . (850) 
<221> CDS 

<222> (986) . . . (1252) 

<221> CDS 
<222> (1354) 

<221> CDS 
<222> (1549) 

<221> CDS 
<222> (1976) 

<221> CDS 
<222> (2251) 

<221> CDS 
<222> (2559) 

<221> CDS 
<222> (2703) 

<221> CDS 
<222> (2975) 

<221> CDS 
<222> (3148) 

<221> CDS 
<222> (3317) 



. . . (1440) 
. . . (1895) 
. . . (2165) 
. . . (2426) 
. . . (2625) 
. . . (2892) 
. . . (3070) 
. . . (3242) 
. . . (3436) 
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<221> 


CDS 




< 2 2 2 > 


\ J 3 *± w / • ■ 


. \ J V Z3 ^ J 


<221> 


CDS 




<222 > 


(3745) 


(3843) 


<221> 


CDS 




<222> 


(3934) . . 


. (4038) 


<22 1 > 


CDS 




<222 > 


(4130) . . 


. (4354) 


<221> 


gene 




< 2 2 2 > 


(535) . . . 


(4354) 


<22 3 > 


/gene="DDMl" 


<221> 


mutation 




<222> 


(785) . . . 


(785) 


<223> 


/note= " 


site of 


replace with 


82 bp" 




<221> 


mutation 




<222> 


(2384) . . 


. (2385) 


<223> 


/note= 


site of 



2385' 



<221> misc__f eature 
<222>' (3186) . . . (3186) 

<223> /note= ''alternate splice donor site used in ddml-2' 



<221> mutation 

<222> (3243) . . . (3243) 

<223> /note= "site of ddml-2 mutation; 



G to A" 



<221> mutation 

<222> (3337) . . . (3337) 

<223> /note= "site of ddml- 



7 (som5) mutation; G to A" 



<221> 
<222> 



tRNA 
(4755) 



<223> /note= 



, . (4826) 

'complement of predicted tRNA-glu" 



<400 = 
tgatcatttt 
aatattgtta 
cgtgcaactg 
tcatatagtt 
agcagattta 
atgaccaaaa 
gggcgctact 
tttcaatata 
C99tgatttc 
agtctgcgct 
tttccggcga 
ttggttccct 
ggtctcactg 
aatggtcagc 
caacgaagag 
aactttgtgt 
tggtcctttt 
ggaagagata 
ggaggaagag 



cttcctccgg 
ttcgttttta 
agatattctt 
tgaagcttca 
ataatgccca 
tcgtaaataa 
cccaatttaa 
ccctcggttt 
tcccgccgtt 
ccagaaaagt 
ttttctaggt 
ctctgcgtaa 
ttgatttatc 
gacgggaaaa 
gtttgttcta 
gttactcttt 
tttctgaatg 
cttctagcca 
cagctgctca 



ccaatttgca 
gccgatatca 
gacacaattt 
attcactaca 
ttccattaaa 
gggttagggg 
taaaaaataa 
tgaatttgct 
tgggtttttc 
tattccgtaa 
ccttaacgct 
ttttgtttgt 
atttctcgat 
cggagaaaga 
tgttctacta 
gtttctttaa 
tgaaggaaaa 
aaaatggaga 
aacttcggga 



gatcgaaaaa 
taactttttg 
ttgcatttga 
aaggttatta 
tgttttttag 
taaacctgtc 
gaaaataggc 
ctcaaaagcg 
ttaccggaat 
gtccctccac 
ctcgaaatcg 
cgtgtttttg 
tttggatttt 
tgcgtctggt 
ttttgccttc 
atctggggtg 
ctgtgaggag 
ttcttctctt 
agatgaagag 



tgatttagct 
agatacatta 
aattggcaat 
ctaattgtgt 
tttaataata 
atttcaagct 
gtaaatatga 
acggagacga 
ttccttctcc 
ctttcctttt 
ctcgctgttc 
gattatattc 
tggactctta 
gattcaccca 
gtagtgtggt 
ttctgtaaat 
aaaagtgtta 
atttctgaag 
aaagctaaca 



ttttattaaa 
tcaacacact 
tttgtactac 
cgacaaatcc 
ggatgatcat 
tcccgcccat 
gagtgtgttt 
ctgtttggct 
ttcgatggtt 
catttcgtta 
ttggtggttt 
tctgactatt 
gggcttcgga 
cttctgttct 
tgctttgtga 
gggtcctttt 
ctgttgtaga 
ccatggctca 
atgctggatc 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
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tgctgttgct 
aactcagctc 
ctttatttct 
tgaacaactt 
gctgagcccg 
gtcggttcca 
tgcttgtttg 
agagcggttg 
ctgacagagg 
ggacagttaa 
ggtttgaatg 
ttcttatcac 
tctacacttt 
tatagatcca 
cgccttccat 
ggaagcacat 
ttgccatgaa 
atgaggtaaa 
aactgataat 
ttgttgaggg 
ctgcaaaata 
acatcacatg 
actagtcttc 
cctttcttag 
aaaaacgaag 
tcaattttcc 
aggttgtttc 
atgttgagct 
atcagaaaaa 
atgccatccg 
ttctgcattg 
aacctggtca 
9atggttcat 
ctgtctaatt 
agtgtggtaa 
aagtatgttt 
atttgctgat 
ttactacttc 
tgaaaggaga 
attctgacca 
ttaaagattt 
gaggactcgg 
taatcaaatc 
aactgtgttt 
cacagaatcg 
gaggtaaaac 
gcctcatgag 
aagctcaagc 
tcttcaacac 
attatcaaca 
actggcgttg 
cgatgcggat 
gacacaagct 
ttcgggagga 
accacttctg 
gatttaatta 
aaaacatgac 
tttgctgttg 
tggcattgag 
aacttatatg 
aaagaagcaa 
tatcctaacc 



cctaatctga 
tactctgagt 
ttcttctttg 
gtgggtgaaa 
agaagactgg 
tttatataat 
agaaagtaac 
ctgctatgat 
aagaaacagt 
agtcttatca 
gaatattagc 
atctgaaagg 
caaattggtt 
atgctttggg 
caatgcaatc 
gcctaaaact 
tgatgccaaa 
ttccgagatt 
tttgttcttg 
aactaaaaca 
atctttctga 
atgaatttga 
tttttttttt 
tatctaactg 
caaccaagga 
tttatttctt 
caaacttcat 
ctcacttcca 
gttccaggaa 
aggtacatga 
atttgttcat 
ttcaacttcg 
gtatgtcagf 
gtttcatttc 
attccgctta 
cacaaaccca 
gaccaggtcc 
agtgagaagg 
agacaggttt 
aatattataa 
cagtgatgag 
aatcaatctt 
aattaattta 
tgtctgatct 
ggcagacgaa 
tctttgttgt 
attggtttat 
tggaacatgt 
ctttagaggt 
aaatctccta 
cttaaggaag 
cttgacaggt 
gctgaagctt 
atgctgtctt 
tgtttttttt 
ttagagggct 
agttaatggt 
cggtaatatt 
ctatttctca 
attaggctca 
aaagctccga 
gctggacgac 



atgaaactca 
ttctccttga 
tggtttctca 
tttgttttgc 
tcgtggacgc 
tttcaactac 
ttacttggat 
ttcaagatct 
catcaaactg 
gcttaaaggt 
tgatcaaatg 
gaatgggttg 
caatgagatt 
gtttctgttg 
atctaccatg 
gttggtccca 
agaattctgc 
ggtcaatgta 
tatattatag 
cttgaagatg 
gctttggtct 
atcatggtac 
tttttttgtt 
atagatgagt 
agaagaagag 
tgattgtatt 
ggtatactac 
cggaaaaagg 
catctggtga 
tctatttttt 
cccctatact 
aaagaactgc 
ttctt'ttaag 
gtgacagatc 
ttggagagat 
tggctcgtag 
ttatcttctc 
ggtttgaggt 
cacctgtgct 
ccataaggtc 
aagagcagct 
actgctgctg 
ttttctttga 
ccagaaccct 
acctgttcat 
tcatatcaat 
gacatttgct 
ggttattggc 
tttaacttct 
ttgacagctt 
atgaaactgc 
tacttgaccg 
ttccagtgaa 
ccctgaacag 
ttttttttcc 
cggaagtttt 
gattagctct 
atgacttgtg 
cgaacttatg 
atagtttcac 
tgccgggaat 
atcggatttg 



gtttactaaa 
gaaaatggag 
cttttcgaan 
tagaatggga 
aaaagaaagg 
tatgcatgat 
gcttttttct 
aaagaagatg 
cagaatgaac 
gtcaaatggc 
ggacttggaa 
gatggtccat 
gctaggtact 
aaagttttct 
gggataaaaa 
agttccctat 
ggcactatcc 
ctaggctttg 
ggccacaggt 
gataacaaac 
ttgttaaatt 
aaacatggtc 
aacactggtg 
ctctacaggt 
aaaagaagag 
tatgtcttat 
gaccattcat 
agattataat 
ataacacgtt 
ttttttaata 
tcaggtcaag 
aaccatcctg 
aaacgtaaga 
tctaccctcc 
tacttgttcg 
ctcatttccc 
ccaatggacg 
ttgcagaatc 
tatgctgctt 
tctctctctc 
gtagtatatt 
atacatgcat 
aggaaaatct 
caaatggact 
gtttataggc 
caatcttaac 
cagacccggg 
caagggcagt 
cttaaagctc 
gaaccaaact 
tgaagataag 
gagtgacctg 
gggtccaggt 
ttaggacaca 
ggaacatgat 
tgtaagttaa 
caatgtgatg 
tacgtttata 
ggatcttatg 
agaatattaa 
cgaacccggg 
ttgatgtcta 



cttgatgagc 
gatat cacaa 
99g^gtcatt: 
tagaaagtga 
ctgcttctca 
cttgtatata 
tcaatcagac 
gtgagaccat 
tttgtcctct 
taatatcatt 
agacgattca 
atctagtcat 
ctcatggcca 
taccttttcc 
tcaaagggat 
agttattact 
atggaaatat 
aagatcaaga 
tgaaaaacca 
ttctgctgac 
ttattctgcc 
cttttctact 
gcagcttttt 
ttgatttttc 
ctcaagtatg 
gctaagggta 
ccttcgaaga 
gtatgctaca 
ggaagcacat 
ctttgtttaa 
gctggaaggg 
accttctcca 
aaaacttctg 
tgttgaagag 
gttatttgcc 
tttgagaact 
aaacttttgg 
gatggcagtg 
ttgcgttgct 
tctctttgcc 
tctcctgagt 
cctctatgac 
ttctctttcg 
tgcaagccat 
tttccacggc 
ttcaaaccat 
ttctgaaacg 
ttcatcaaga 
aatccttttt 
aacacacagg 
ttgatacaaa 
acaattactg 
tgggaagtgg 
ttaataagcc 
cggttacttt 
agaactcact 
aaaacaattg 
gtctttgtag 
ttttggattt 
aaacttgagt 
tctcctgggt 
ttcttgtaaa 



tcttgacgca 
ttgtaatctt 
attcttagtt 
gagccaaaaa 
gtacaacaat 
ttgttttttc 
taaggctaag 
caactcagat 
tctcactggt 
gtggcagaat 
aacgatcggt 
tgctccactg 
tatgtgtttg 
attaggttca 
gagctcagga 
tcttatgagg 
gttgtgattg 
tgatctctcc 
caagtgtaaa 
sggaacacct 
tgacatcttt 
attatcccta 
gacatttatt 
tgaaaagaac 
tacaattata 
catcttgtct 
atgaaatgtg 
atgactgatc 
cttggagaga 
ttatgtcatt 
aaagcttaac 

ggggcaaata 

tcatactgtt 
attgttggac 
aataatcaca 
tctctgatcc 
acattatgga 
tgaagctgga 
ttcaagcaat 
ttgaaacaga 
accagagctg 
agcgactggg 
tgttgtctcc 
ggacagatgc 
tcagtcgata 
tgagattgtt 
agcgtacagt 
acgtgccaag 
agatacactt 
aagaggacat 
ccgatataag 
caccgggaga 
tcctgcctag 
aggccttgaa 
tggctgggag 
taaaaccctg 
gccctctgat 
tctgcaattt 
gggatttgtt 
agggtttaaa 
gaaagccaga 
tagtaaatat 



1200 

1260 

1320 

1380 

1440 

1500 

1560 

1620 

1680 

1740 

1800 

1860 

1920 

1980 

2040 

2100 

2160 

2220 

2280 

2340 

2400 

2460 

2520 

2580 

2640 

2700 

2760 

2820 

2880 

2940 

3000 

3060 

3120 

3180 

3240 

3300 

3360 

3420 

3480 

3540 

3600 

3660 

3720 

3780 

3840 

3900 

3960 

4020 

4080 

4140 

4200 

4260 

4320 

4380 

4440 

4500 

4560 

4620 

4680 

4740 

4800 

4860 
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rnagttttac cggttttgca tctaatggac taaaacatga acacgagacg ccgacaagaa 4920 
rgaacggggc aggcaccaaa cattcgggca aaagtatgca gtggggcatt attgacaatt 4980 
cgaccaccac aagagctaat 5000 



<210> 2 
<^211> 764 
<212> PRT 

<213> Arabidopsis thaliana 



<400> 2 



Met 


Val 


Ser 


Leu 


Arg 


Ser 


Arg 


Lys 


1 








5 








Ser 


Asp 


Gly 


Lys 


Thr 


Glu 


Lys 


Asp 








20 










Val 


Leu 


Asn 


Glu 


Glu 


Glu 


Asn 


Cys 






35 










40 


Glu 


Glu 


Glu 


He 


Leu 


Leu 


Ala 


Lys 




50 










55 




Glu 


Ala 


Met 


Ala 


Gin 


Glu 


Glu 


Glu 


65 










70 






Glu 


Glu 


Lys 


Ala 


Asn 


Asn 


Ala 


Gly 










85 








Glu 


Thr 


Gin 


Phe 


Thr 


Lys 


Leu 


Asp 








100 










Tyr 


Ser 


Glu 


Phe 


Leu 


Leu 


Glu 


Lys 






115 










120 


lie 


Glu 


Ser 


Glu 


Ser 


Gin 


Lys 


Ala 




130 










135 




Arg 


Lys 


Arg 


Lys 


Ala 


Ala 


Ser 


Gin 


145 










150 






Ala 


Val 


Ala 


Ala 


Met 


He 


Ser 


Arg 










165 








Asn 


Ser 


Asp 


Leu 


Thr 


Glu 


Glu 


Glu 








180 










Leu 


Cys 


Pro 


Leu 


Leu 


Thr 


Gly 


Gly 






195 










200 


Gly 


Val 


Lys 


Trp 


Leu 


He 


Ser 


Leu 




210 










215 




Leu 


Ala 


Asp 


Gin 


Met 


Gly 


Leu 


Gly 


225 










230 






Leu 


Ser 


His 


Leu 


Lys 


Gly 


Asn 


Gly 










245 








Ala 


Pro 


Leu 


Ser 


Thr 


Leu 


Ser 


Asn 








260 










Thr 


Pro 


Ser 


He 


Asn 


Ala 


He 


He 






275 










280 


Asp 


Glu 


Leu 


Arg 


Arg 


Lys 


His 


Met 




290 










295 




Pro 


He 


Val 


He 


Thr 


Ser 


Tyr 


Glu 


305 










310 






He 


Leu 


Arg 


His 


Tyr 


Pro 


Trp 


Lys 










325 








Arg 


Leu 


Lys 


Asn 


His 


Lys 


Cys 


Lys 








340 










Lys 


Met 


Asp 


Asn 


Lys 


Leu 


Leu 


Leu 






355 










360 


Leu 


Ser 


Glu 


Leu 


Trp 


Ser 


Leu 


Leu 




370 










375 




Thr 


Ser 


His 


Asp 


Glu 


Phe 


Glu 


Ser 


385 










390 






Lys 


Asn 


Glu 


Ala 


Thr 


Lys 


Glu 


Glu 










405 









Val 


He 
10 


Pro 


Ala 


Ser 


Glu 


Met 
15 


Val 


Ala 


Ser 


Gly 


Asp 


Ser 


Pro 


Thr 


Ser 


25 










30 






Glu 


Glu 


Lys 


Ser 


Val 
45 


Thr 


Val 


Val 


Asn 


Gly 


Asp 


Ser 
60 


Ser 


Leu 


He 


Ser 


Gin 


Leu 


Leu 
75 


Lys 


Leu 


Arg 


Glu 


Asp 
80 


Ser 


Ala 
90 


Val 


Ala 


Pro 


Asn 


Leu 
95 


Asn 


Glu 


Leu 


Leu 


Thr 


Gin 


Thr 


Gin 


Leu 


105 










110 






Met 


Glu 


Asp 


He 


Thr 
125 


He 


Asn 


Gly 


Glu 


Pro 


Glu 


Lys 
140 


Thr 


Gly 


Arg 


Gly 


Tyr 


Asn 


Asn 
155 


Thr 


Lys 


Ala 


Lys 


Arg 
160 


Ser 


Lys 
170 


Glu 


Asp 


Gly 


Glu 


Thr 
175 


He 


Thr 


Val 


He 


Lys 


Leu 


Gin 


Asn 


Glu 


185 










190 






Gin 


Leu 


Lys 


Ser 


Tyr 
205 


Gin 


Leu 


Lys 


Trp 


Gin 


Asn 


Gly 
220 


Leu 


Asn 


Gly 


He 


Lys 


Thr 


He 
235 


Gin 


Thr 


He 


Gly 


Phe 
240 


Leu 


Asp 
250 


Gly 


Pro 


Tyr 


Leu 


Val 
255 


He 


Trp 


Phe 


Asn 


Glu 


He 


Ala 


Arg 


Phe 


265 










270 






Tyr 


His 


Gly 


Asp 


Lys 
285 


Asn 


Gin 


Arg 


Pro 


Lys 


Thr 


Val 
300 


Gly 


Pro 


Lys 


Phe 


Val 


Ala 


Met 
315 


Asn 


Asp 


Ala 


Lys 


Arg 
320 


Tyr 


Val 
330 


Val 


He 


Asp 


Glu 


Gly 
335 


His 


Leu 


Leu 


Arg 


Glu 


Leu 


Lys 


His 


Leu 


345 










350 






Thr 


Gly 


Thr 


Pro 


Leu 
365 


Gin 


Asn 


Asn 


Asn 


Phe 


He 


Leu 
380 


Pro 


Asp 


He 


Phe 


Trp 


Phe 


Asp 
395 


Phe 


Ser 


Glu 


Lys 


Asn 
400 


Glu 


Glu 
410 


Lys 


Arg 


Arg 


Ala 


Gin 
415 


Val 
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Val 


Ser 


Lys 


Leu 


His 


Gly 


lie 


Leu 


Arg 


Pro 


Phe 


He 


Leu 


Arg 


Arg 


Met 








42 0 










425 










430 






Lys 


Cys 


Asp 


Val 


Glu 


Leu 


Ser 


Leu 


Pro 


Arg 


Lys 


Lys 


Glu 


He 


He 


Met 






43 5 










44 0 










445 








Tyr 


Ai a 


Thr 


Met 


Thr 


Asp 


His 


Gin 


Lys 


Lys 


Phe 


Gin 


Glu 


His 


Leu 


Val 




4 5 0 










455 










460 










Asn 


Asn 


Thr 


Leu 


Glu 


Ala 


Hi s 


Leu 


Gly 


Glu 


Asn 


Ala 


He 


Arg 


Gly 


Gin 


4 65 










470 










475 








480 


o xy 


Trp 


Lys 


(jiy 


Lys 


Leu 


Asn 


Asn 


Leu 


Vai 


iie 


Gin 


Leu 


Arg 


Lys 


Asn 










4 85 










4 90 










495 




Cy s 


Asn 


His 


Pro 


Asp 


Leu 


Leu 


Gin 


Gly 


Gin 


He 


Asp 


Gly 


Ser 


Tyr 


Leu 








500 










505 










510 






Tyr 


Pro 


Pro 


vai 


Glu 


Glu 


lie 


Vai 


Gly 


Gin 


Cys 


Gly 


Lys 


Phe 


Arg 


Leu 






515 










520 










525 








Leu 


Glu 


Arg 


Leu 


Leu 


Val 


Arg 


Leu 


Phe 


Ala 


Asn 


Asn 


His 


Lys 


Val 


Leu 




53 0 










535 










540 










lie 


Phe 


Ser 


Gin 


Trp 


Thr 


Lys 


Leu 


Leu 


Asp 


He 


Met 


Asp 


Tyr 


Tyr 


Phe 


545 










550 










555 










560 


Ser 


Glu 


Lys 


Gly 


Phe 


Glu 


Val 


Cys 


Arg 


He 


Asp 


Gly 


Ser 


Val 


Lys 


Leu 










565 










570 










575 




Asp 


Glu 


Arg 


Arg 


Arg 


Gin 


He 


Lys 


Asp 


Phe 


Ser 


Asp 


Glu 


Lys 


Ser 


Ser 








58 0 










585 










590 






Cy s 


ber 


i ie 


Phe 


Leu 


Leu 


Ser 


Thr 


Arg 


Aia 


Gly 


Gly 


Leu 


Gly 


He 


Asn 






5 95 










600 










605 








Leu 


Thr 


Aia 


Aia 


Asp 


Thr 


Cys 


iie 


Leu 


Tyr 


Asp 


Ser 


Asp 


Trp 


Asn 


Pro 




610 










615 










620 










Gin 


Met 


Asp 


Leu 


Gin 


Ala 


Met 


Asp 


Arg 


Cys 


His 


Arg 


He 


Gly 


Gin 


Thr 


625 










630 










635 










640 


Lys 


Pro 


Val 


His 


Vai 


Tyr 


Arg 


Leu 


Ser 


Thr 


Ala 


Gin 


Ser 


He 


Glu 


Thr 










645 










650 










655 




Arg 


Val 


Leu 


Lys 


Arg 


Ala 


Tyr 


Ser 


Lys 


Leu 


Lys 


Leu 


Glu 


His 


Val 


Val 








660 










665 










670 






lie 


Gly 


Gin 


Gly 


Gin 


Phe 


His 


Gin 


Glu 


Arg 


Ala 


Lys 


Ser 


Ser 


Thr 


Pro 






675 










680 










685 








Leu 


w±U 


o J.U 


Oi.U 


Asp 


J. i. e 


Leu 


j\± a 


Leu 


Leu 


Lys 


CjiU 


Asp 


GiU 


Thr 


Ala 




690 










695 










700 










Glu 


Asp 


Lys 


Leu 


He 


Gin 


Thr 


Asp 


He 


Ser 


Asp 


Ala 


Asp 


Leu 


Asp 


Arg 


705 










710 










715 










720 


Leu 


Leu 


Asp 


Arg 


Ser 


Asp 


Leu 


Thr 


He 


Thr 


Ala 


Pro 


Gly 


Glu 


Thr 


Gin 










725 










730 










735 




Ala 


Ala 


Glu 


Ala 


Phe 


Pro 


Val 


Lys 


Gly 


Pro 


Gly 


Trp 


Glu 


Val 


Val 


Leu 








740 










745 










750 






Pro 


Ser 


Ser 


Gly 


Gly 


Met 


Leu 


Ser 


Ser 


Leu 


Asn 


Ser 











755 760 



<210> 3 

<211> 5000 

<212> DNA 

<213> Arabidopsis thaliana 



<400> 3 

tgtcgaagtt tccatggaag attgtgacca cgacgatgaa gctgaagatt ctggtcacgt 60 

tgaaaacctt tgttacagat ttcgcaaacg aatcgattcg ttgccataag tgttttaggt 12 0 

gacaaagcta tcacttcagc gtctggatct gaatttagac aatcagtgag aacaactaaa 180 

aacagaaaat ttcaaactca aaaaacagaa aaaaaaaagt ttggattttt gagaagtacc 240 

aggcattcca ggaagattcc gtttcttctt cccgacggat ttaggagtta gattttggtt 300 

tccggtcgat gagacgcttg catcgccgga aactgtagag gaattatcta aatcaaccgg 3 60 

catgtttcaa agatactaaa ttccaatctt tgaacacaaa aaggaagaag caaatctcag 420 

ctcagctcaa tctagggttt atcatcctcc tcctactctg tttagtctct ctttctctct 480 

ctcttcttca gctaccagtc aatctgcttt tcgtaaaaat ctccttttcc cctttccgcc 540 

accaaacttt tctgataact cactctctga cctctcttct tcaaaaagat ttaaaacccc 600 

caaaagaaaa agaaaaaaaa tcaaaacttc attacccaag aaatctctta atcatttaac 660 
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ccagactctt tcttctccac acgcatcttt tatccaccgt ccaccgatct gatccaacgg 720 
ctgagatttc accggagacg agttatcctt actacttccg gcttgtttct ctctgaagaa 780 
tcaccggaaa aaaaaataag gcggcttgtg tgtgagactt tgtgtgaaag cttcaacctt 840 
ttttttcttt ttctttggct tgtccaagaa aaaggagcct tcttcttctt ttctctctct 900 
ggagacaatt atactaattt ttttcttttc aacttttcac cctttttttt ttgttaacaa 960 

acatttttta tacacaattg tgtcgacttt caagttccaa gtatctaaat ctgtattttg 1020 

gacccccatg caaataatta aaatagaata atctttttgt agattttaaa ttgaaaacgg 1080 

tgtagaaagg ttaaaagcac caaacaaaac gagtaaatag atattgtaat aattttttca 1140 

cctttatgga aaagattata tcatagacga tgtacacaga tgaaaattag aaaatggcat 1200 

gtgaatatat gcagtaccca atgaatgcaa tatcaggttt gtattatttt tctattgtat 1260 

ctctacatgt tacgtaatca aacgatcaag taatttatta atattgtcga tggcgtagaa 1320 

attataaatt tattttatgt cattgtttac tatatagatt ttgagctaaa cgacttattt 1380 

tgtcaaaaga tatatccgtg tttggtttaa gattgggttt tagtatttcc aatattaatc 1440 

taaattctta gcttatgaac atgtcaataa acaaaaaaat tattttactg tcactgtcct 1500 

tagacgggga caaaggaggg tattaccgtc gcgttgtcgg accgtaaaat aattaaccaa 1560 

attttgttgt tgaacgaata acatttttta ctgtgggaat ttgtcgtgta gcattacgtt 1620 

cgaaatcgca atttgttttc ttctttgtgg gtgtatattt ctggttaacg aaactataac 1680 

ccaatttaat gcaatgttcg tctgtttttg ttgactttga cccttttttg gtaatattcg 1740 

ttcagctttt gttttaacgt tttcattgcc ttgtaggcat ctgagaagct cagattctga 1800 

cacgtgtctt ttgttatctg aatttgcatc cgttggataa acatgacgct gacaggtgga 1860 

ttgaaaagta accagcttgg atttccgtgt atatgttaca ccgccacttc ccttaatttc 1920 

ttcgttctta gttaaaataa aaaaggttta atttatgagt aaaagtatgt aaaacgacaa 1980 

cgattactat aagaattaaa atttatcttt gcttagtaat ttgcacttaa gattggattc 2040 

aaattttgta aaaagcgaat gttacatata tgtccattga aaaaattgca tttgacttta 2100 

caagcattga aattaattaa tttgggaccc ctttttttgt tagtttcaaa ggaagaatta 2160 

ttttaggctg agatgggtcc ctccataaac tcactattct gccagcatac aaattcctta 2220 

acatatggtc caaatagcag ttccaaccac tagtatccaa taataatctg aacaaattat 2280 

ctttcttttt ttttcctgat aatcttgtat ttgtttgttc aatgagctta atacgtatat 2340 

tagttatgac ttataactaa atactttgac tcacttgatc cgtacacatt gatttcgttt 2400 

attcaaatcc gaacaacgta atgatctttt tgggccgagt tatttgtatt ctcaacctga 2460 

gtccaaccat gctttatggg cttttctgtt tatttatgca tgtaaagttt ataatgcttg 2520 

caaataacca catattgtat gaatgtaatt actatgattt aagggcactg cttttctgtt 2580 

ttcacgttgt tttcgaaatt gctattgcgt gtgatatctg tgttggacca attattgaaa 2640 

aggacaaggc tgactctggt ttttaatgag tagtccccat gggagttatg ttcatttacc 2700 

acacattttt ttgtatagta tagtatgagt ttttatttga tatcttttat cttcggaaaa 2760 

taaatggttc aaattgtttg tctaaaaatg cacacatgaa tatcttgtgg tctcacacaa 2820 

ttgtaggaaa caaattaata tttgttgcga aaataatgtt attattttat catacgaaat 2880 

cctagagaaa atggtggcaa aagaggcaaa gactaaacta atgaatttaa aatatgaaaa 2 940 

tgatggaatg actggtttac caatattaca gtatattgta attttataaa aacgaatcct 3000 

gaagaagagg gcaaacccca agaccacgca aatcagtcta caaatatgaa aatttccaat 3060 

aactagaaaa acatgtgcat ttatcttttt ccatcattcg gatttttaca atggaaattt 3120 

tgaccactga gcgcaagtgt tatagtattt tattattatc caatattaat atcattattc 3180 

ggatccatgc attctatata actatgtcca ccatcttact tgtgtctatg ttgcaacttc 3240 

aacgtcgtat atatataggg attgttgtca cgaatacaat gctaattaag gaagattgtg 3300 

acttctcgga aaatttagaa ctaattaaga gtggaactaa aatgccaatg aaaatagcct 3360 

aaatcaaagg agaaccacaa atataaattg gaagacctta aaaaacaatt aaacgaggac 3420 

gaaacaaatt ttggaatcat caattatacg aaaaaaagaa gaaagaaaaa agaggtttca 3480 

tgaatcacag tagtgctgac aatcttcgaa ccatttgtgg gtttcataca atcgatcacc 354 0 

aatagaacaa aagagaaaca gaggaacaga aagaatagaa ggagtgggaa gtgtatgagg 3 600 

aagctgtgtc cgaacataga caaagacgat ggtctggaga cggtgttgga agttccgata 3 660 

ccggaggaga tgttttccgg tatgggcaac aacgttgcac ttaggtggca aaatatgatg 3720 

acgtggatga aagctcaaac gtctgataaa tggtcgcaac cgcttatcgc cgctcgtatc 3 780 

aacgagctcc ggttccttct ctacctcgtt ggctcgcctc ttatacctct ccaggttcaa 3840 

gtcggtcact ctgttcataa gcccgtcaaa gattgctcca ttgtaagtca ttcaaaatca 3900 

atccttatga aaacataaca aagatgttga aaatatgatt cctctttttt ttttcttttt 3960 

ttcttttatg atcaaaaccc aaaaaagtca ttaccctgct tcgtaagtat tcaacataaa 4020 

gttgttaatc catgtgttgt actctgcaag tctgcattac attattcatc gtacacagag 4080 

tcatcaactt cagtttcatt gtttttttgc ttatgaatta cgattgcagc aagcttcaac 4140 

ggcgaaatac attgtacagc agtacatagc agcgacggga ggaccacagg cgttaaacgc 4200 

cgtgaacagc atgtgcgtca cgggacaagt gaagatgacg gcgtcggagt ttcatcaagg 4260 

agatgattcg ggcgttaatc taaagagcaa cgacgaaatg ggtggtttcg ttttatggca 4320 

aaaggatcca gatctttggt gtttggagct cgtcgtctcc ggttgcaaag tggatatgtg 4380 
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gaagcaacgg 
gaacgccaar 
gatgtaatgt 
gttaratcct 
taatcaacgg 
ctcaaagcgg 
gatcgggact 
acgaagatgt 
acaatgtgaa 
cgtcggcgaa 
atgttngggg 



tcggct ttca 
acccctccgc 
ccggttctca 
cgttcgacgg 
cgaggantgc 
tccgaacttt 
tttgattcag 
ct cctgggag 
catcgctcac 
tcatcggaga 
tctctccgtt 



tggcgacatt 
cggtttwtac 
aaatgttgaa 
cgaatctgtt 
tttatcttga 
gagataattc 
t tcgaagat t 
accagtgctg 
ggcgggaaaa 
cagatgacgg 



<210> 4 
<211> 603 
<212> PRT 

<213> Mus musculus 



cctctaacca 
aggtccaatc 
ccggtggttt 
tcttgacgca 
aactggagac 
atcacacgat 
cgcggctttt 
agtcggtgat 
catcggtcac 
agaagtggag 



gcaaactccg 
cggttattga 
atttattgtt 
aacgtgtatc 
gagtccggcg 
atggggttat 
gagaatgagg 
ggatgattac 
ggt tttccgg 
gatagaagaa 



gcgtctacsg 
tttttttttk 
tggagcaggg 
ggagagaaga 
gttcgagaag 
tttagtcaaa 
accaaggaag 
cgatacgttg 
tacggtgaag 
gttgatttta 



4440 
4500 
4560 
4620 
4680 
4740 
4800 
4860 
4920 
4980 
5000 





<400> 


4 


























Met 


Leu 


Trp 


Glu 


Asn 


Gly 


He 


Asn 


Gly 


He 


Leu 


Ala 


Asp 


Glu 


Met 


Gly 


1 








5 










10 








15 


Leu 


Gly 


Lys 


Thr 
20 


Val 


Gin 


Cys 


He 


Ala 
25 


Thr 


He 


Ala 


Leu 


Met 
30 


He 


Gin 


Arg 


Gly 


Val 
35 


Pro 


Gly 


Pro 


Phe 


Leu 
40 


Val 


Cys 


Gly 


Pro 


Leu 
45 


Ser 


Thr 


Leu 


Pro 


Asn 
50 


Trp 


Met 


Ala 


Glu 


Phe 
55 


Lys 


Arg 


Phe 


Thr 


Pro 
60 


Glu 


He 


Pro 


Thr 


Leu 


Leu 


Tyr 


His 


Gly 


Thr 


Arg 


Glu 


Asp 


Arg 


Arg 


Lys 


Leu 


Val 


Lys 


Asn 


65 


His 








70 










75 








80 


He 


Lys 


Arg 


Gin 
85 


Gly 


Thr 


Leu 


Gin 


He 
90 


His 


Pro 


Val 


Val 


Val 
95 


Thr 


Ser 


Phe 


Glu 


He 


Ala 


Met 


Arg 


Asp 


Gin 


Asn 


Ala 


Leu 


Gin 


His 


Cys 


Tyr 








100 










105 










110 


Trp 


Lys 


Tyr 


Leu 


He 


Val 


Asp 


Glu 


Gly 


His 


Arg 


He 


Lys 


Asn 


Met 


Lys 






115 










120 










125 






Cys 


Arg 


Leu 


He 


Arg 


Glu 


Leu 


Lys 


Arg 


Phe 


Asn 


Ala 


Asp 


Asn 


Lys 


Leu 




130 










135 










140 






Leu 


Leu 


Thr 


Gly 


Thr 


Pro 


Leu 


Gin 


Asn 


Asn 


Leu 


Ser 


Glu 


Leu 


Trp 


Ser 


145 










150 










155 








160 


Leu 


Leu 


Asn 


Phe 


Leu 
165 


Leu 


Pro 


Asp 


Val 


Phe 

170 


Asp 


Asp 


Leu 


Lys 


Ser 
175 


Phe 


Glu 


Ser 


Trp 


Phe 


Asp 


He 


Thr 


Ser 


Leu 


Ser 


Glu 


Thr 


Ala 


Glu 


Asp 


He 








180 










185 










190 




He 


Ala 


Lys 
195 


Glu 


Arg 


Glu 


Gin 


Asn 
200 


Val 


Leu 


His 


Met 


Leu 
205 


His 


Gin 


He 


Leu 


Thr 
210 


Pro 


Phe 


Leu 


Leu 


Arg 
215 


Arg 


Leu 


Lys 


Ser 


Asp 
220 


Val 


Ala 


Leu 


Glu 


Val 


Pro 


Pro 


Lys 


Arg 


Glu 


Val 


Val 


Val 


Tyr 


Ala 


Pro 


Leu 


Cys 


Asn 


Lys 


225 










230 










235 








240 


Gin 


Glu 


He 


Phe 


Tyr 
245 


Thr 


Ala 


He 


Val 


Asn 
250 


Arg 


Thr 


He 


Ala 


Asn 
255 


Met 


Phe 


Gly 


Ser 


Cys 


Glu 


Lys 


Glu 


Thr 


Val 


Glu 


Leu 


Ser 


Pro 


Thr 


Gly 


Arg 








260 










265 










270 


Pro 


Lys 


Arg 


Arg 


Ser 


Arg 


Lys 


Ser 


He 


Asn 


Tyr 


Ser 


Glu 


Leu 


Asp 


Gin 






275 










280 










285 






Phe 


Pro 
290 


Ser 


Glu 


Leu 


Glu 


Lys 
295 


Leu 


He 


Ser 


Gin 


He 
300 


Gin 


Pro 


Glu 


Val 


Asn 


Arg 


Glu 


Arg 


Thr 


Val 


Val 


Glu 


Gly 


Asn 


He 


Pro 


He 


Glu 


Ser 


Glu 


305 










310 










315 










320 


Val 


Asn 


Leu 


Lys 


Leu 


Arg 


Asn 


He 


Met 


Met 


Leu 


Leu 


Arg 


Lys 


Cys 


Cys 










325 










330 










335 


Asn 


His 


Pro 


Tyr 
340 


Met 


He 


Glu 


Tyr 


Pro 
345 


He 


Asp 


Pro 


Val 


Thr 
350 


Gin 


Glu 
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Phe 


Lys 


He 


Asp 


Glu 


Glu 


Leu 


Val 


Thr 


Asn 


Ser 


Gly 


Lys 


Phe 


Leu 


He 






355 










360 










365 








Leu 


Asp 


Arg 


Met 


Leu 


Pro 


Glu 


Leu 


Lys 


Lys 


Arg 


Gly 


His 


Lys 


Val 


Leu 




370 










375 










380 










Val 


Phe 


Ser 


Gin 


Met 


Thr 


Ser 


Met 


Leu 


Asp 


He 


Leu 


Met 


Asp 


Tyr 


Cys 


385 










390 










395 










400 


His 


Leu 


Arg 


Asn 


Phe 


He 


Phe 


Ser 


Arg 


Leu 


Asp 


Gly 


Ser 


Met 


Ser 


Tyr 










405 










410 










415 




Ser 


Glu 


Arg 


Glu 


Lys 


Asn 


He 


Tyr 


Ser 


Phe 


Asn 


Thr 


Asp 


Pro 


Asp 


Val 








420 










425 










430 






Phe 


Leu 


Phe 


Leu 


Val 


Ser 


Thr 


Arg 


Ala 


Gly 


Gly 


Leu 


Gly 


He 


Asn 


Leu 






435 










440 










445 








Thr 


Ala 


Ala 


Asp 


Thr 


Val 


He 


He 


Tyr 


Asp 


Ser 


Asp 


Trp 


Asn 


Pro 


Gin 




450 










455 










460 










Ser 


Asp 


Leu 


Gin 


Ala 


Gin 


Asp 


Arg 


Cys 


His 


Arg 


He 


Gly 


Gin 


Thr 


Lys 


465 










470 










475 










480 


Pro 


Val 


Val 


Val 


Tyr 


Arg 


Leu 


Val 


Thr 


Ala 


Asn 


Thr 


He 


Asp 


Gin 


Lys 










485 










490 










495 




lie 


Val 


Glu 


Arg 


Ala 


Ala 


Ala 


Lys 


Arg 


Lys 


Leu 


Glu 


Lys 


Leu 


He 


He 








500 










505 










510 






His 


Lys 


Asn 


His 


Phe 


Lys 


Gly 


Gly 


Gin 


Ser 


Gly 


Leu 


Ser 


Gin 


Ser 


Lys 






515 










520 










525 








Asn 


Phe 


Leu 


Asp 


Ala 


Lys 


Glu 


Leu 


Met 


Glu 


Leu 


Leu 


Lys 


Ser 


Arg 


Asp 




530 










535 










540 










Tyr 


Glu 


Arg 


Glu 


Val 


Lys 


Gly 


Ser 


Arg 


Glu 


Lys 


Val 


He 


Ser 


Asp 


Glu 


545 










550 










555 










560 


Asp 


Leu 


Glu 


Leu 


Leu 


Leu 


Asp 


Arg 


Ser 


Asp 


Leu 


He 


Asp 


Gin 


Met 


Lys 










565 










570 










575 




Ala 


Ser 


Arg 


Pro 


He 


Lys 


Gly 


Lys 


Thr 


Gly 


He 


Phe 


Lys 


He 


Leu 


Glu 








580 










585 










590 






Asn 


Ser 


Glu 


Asp 


Ser 


Ser 


Ala 


Glu 


Cys 


Leu 


Phe 













595 600 

<210> 5 

<211> 1052 

<212> PRT 

<213> Homo sapiens 



<400> 5 



Met 


Ser 


Ser 


Ala 


Ala 


Glu 


Pro 


Pro 


Pro 


Pro 


Pro 


Pro 


Pro 


Glu 


Ser 


Ala 


1 








5 










10 










15 




Pro 


Ser 


Lys 


Pro 


Ala 


Ala 


Ser 


He 


Ala 


Ser 


Gly 


Gly 


Ser 


Asn 


Ser 


Ser 








20 










25 










30 






Asn 


Lys 


Gly 


Gly 


Pro 


Glu 


Gly 


Val 


Ala 


Ala 


Gin 


Ala 


Val 


Ala 


Ser 


Ala 






35 










40 










45 








Ala 


Ser 


Ala 


Gly 


Pro 


Ala 


Asp 


Ala 


Glu 


Met 


Glu 


Glu 


He 


Phe 


Asp 


Asp 




50 










55 










60 










Ala 


Ser 


Pro 


Gly 


Lys 


Gin 


Lys 


Glu 


He 


Gin 


Glu 


Pro 


Asp 


Pro 


Thr 


Tyr 


65 










70 










75 










80 


Glu 


Glu 


Lys 


Met 


Gin 


Thr 


Asp 


Arg 


Ala 


Asn 


Arg 


Phe 


Glu 


Tyr 


Leu 


Leu 










85 










90 










95 




Lys 


Gin 


Thr 


Glu 


Leu 


Phe 


Ala 


His 


Phe 


He 


Gin 


Pro 


Ala 


Ala 


Gin 


Lys 








100 
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110 






Thr 


Pro 


Thr 


Ser 


Pro 


Leu 


Lys 


Met 


Lys 


Pro 


Gly 


Arg 


Pro 


Arg 


He 


Lys 






115 










120 










125 








Lys 


Asp 


Glu 


Lys 


Gin 


Asn 


Leu 


Leu 


Ser 


Val 


Gly 


Asp 


Tyr 


Arg 


His 


Arg 




130 










135 










140 










Arg 


Thr 


Glu 


Gin 


Glu 


Glu 


Asp 


Glu 


Glu 


Leu 


Leu 


Thr 


Glu 


Ser 


Ser 


Lys 


145 










150 










155 










160 


Ala 


Thr 


Asn 


Val 


Cys 


Thr 


Arg 


Phe 


Glu 


Asp 


Ser 


Pro 


Ser 


Tyr 


Val 


Lys 










165 










170 










175 




Trp 


Gly 


Lys 


Leu 


Arg 


Asp 


Tyr 


Gin 


Val 


Arg 


Gly 


Leu 


Asn 


Trp 


Leu 


He 
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- 
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XT X W 


ir X W 
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Val 
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He 
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645 








Gly 
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Phe 
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X O D 










190 
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(jXU 
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o r\ c 








Ser 
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Val 
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Trp 
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^ ^ D 










"3 c r» 
J 3 U 
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Ala 


sp 


Asp 
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Asp 
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Trp 










C 
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Arg 


Arg 


X X c 
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Asp 
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Q c: 
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4 U U 


u X LL 
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Lys 


± X C 


Tyr 


V a X 


o xy 


Leu 
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4 X3 
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X X c 




Me t 
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Asp 
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4 J U 
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4 4 3 
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Ir X U 


xyiT 
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A r\ 
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Thr 
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^ 7 Q 
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A fl n 
4 O U 
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V d X 
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X X c 
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C A O 










It XIC 
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Arg 






D D O 
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Thr 
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Val 


Val 


He 
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O / L/ 










R 7 ^ 

3^3 




Val 


Asp 
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Ala 


Met 


Asp 
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3 -7 V 






Thr 


Val 


Arg 


Val 
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Al*Cf 


Phe 


He 










605 








He 


Val 


Glu 


Arg 


Ala 


Glu 


Met 


Lvs 








620 










Gin 


Gin 


Gly Arg 


Leu 


Val 


Asp 


Gin 






635 










640 


Glu 


Met 
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Gin 


Met 


He 


Arg 


His 
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He 
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Asp 
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675 680 685 

Asn Glu Lys Leu Ser Lys Met Gly Glu Ser Ser Leu Arg Asn Phe Thr 

690 695 700 

Met Asp Thr Glu Ser Ser Val Tyr Asn Phe Glu Gly Glu Asp Tyr Arg 
705 710 715 720 

Glu Lys Gin Lys lie Ala Phe Thr Glu Trp lie Glu Pro Pro Lys Arg 

725 730 735 

Glu Arg Lys Ala Asn Tyr Ala Val Asp Ala Tyr Phe Arg Glu Ala Leu 

740 745 750 

Arg Val Ser Glu Pro Lys Ala Pro Lys Ala Pro Arg Pro Pro Lys Gin 

755 760 765 

Pro Asn Val Gin Asp Phe Gin Phe Phe Pro Pro Arg Leu Phe Glu Leu 

770 775 780 

Leu Glu Lys Glu He Leu Phe Tyr Arg Lys Thr He Gly Tyr Lys Val 
785 790 795 800 

Pro Arg Asn Pro Glu Leu Pro Asn Ala Ala Gin Ala Gin Lys Glu Glu 

805 810 815 

Gin Leu Lys He Asp Glu Ala Glu Ser Leu Asn Asp Glu Glu Leu Glu 

820 , 825 830 

Glu Lys Glu Lys Leu Leu Thr Gin Gly Phe Thr Asn Trp Asn Lys Arg 

835 840 845 

Asp Phe Asn Gin Phe He Lys Ala Asn Glu Lys Trp Gly Arg Asp Asp 

850 855 860 

He Glu Asn He Ala Arg Glu Val Glu Gly Lys Thr Pro Glu Glu Val 
865 870 875 880 

He Glu Tyr Ser Ala Val Phe Trp Glu Arg Cys Asn Glu Leu Gin Asp 

885 890 895 

He Glu Lys He Met Ala Gin He Glu Arg Gly Glu Ala Arg He Gin 

900 905 910 

Arg Arg He Ser He Lys Lys Ala Leu Asp Thr Lys He Gly Arg Tyr 

915 920 925 

Lys Ala Pro Phe His Gin Leu Arg He Ser Tyr Gly Thr Asn Lys Gly 

930 935 940 

Lys Asn Tyr Thr Glu Glu Glu Asp Arg Phe Leu He Cys Met Leu His 
945 950 955 960 

Lys Leu Gly Phe Asp Lys Glu Asn Val Tyr Asp Glu Leu Arg Gin Cys 

965 970 975 

He Arg Asn Ser Pro Gin Phe Arg Phe Asp Trp Phe Leu Lys Ser Arg 

980 985 990 

Thr Ala Met Glu Leu Gin Arg Arg Cys Asn Thr Leu He Thr Leu He 

995 1000 1005 

Glu Arg Glu Asn Met Glu Leu Glu Glu Lys Glu Lys Ala Glu Lys Lys 

1010 1015 1020 

Lys Arg Gly Pro Lys Pro Ser Thr Gin Lys Arg Lys Met Asp Gly Ala 
1025 1030 1035 1040 

Pro Asp Gly Arg Gly Arg Lys Lys Lys Leu Lys Leu 
1045 1050 

<210> 6 
<211> 21 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> /note= "synthetic construct" 
<400> 6 

tcaaggagat gattcgggcg t 21 

<210> 7 
<211> 22 
<212> DNA 

<213> Artificial Sequence 



V 



wo 99/55891 



PCT/US99/09268 



- 45 - 

<220> 

<223> /note= "synthetic construct" 

<400> 7 
aaaggaccca tttacagaac ac 

<210> 8 
<211> 24 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> /note= "synthetic construct" 

<400> 8 
gctggaaggg aaagcttaac aacc 

<210> 9 
<211> 24 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> /note= "synthetic construct" 

<400> 9 
acactgccat cgattctgca aacc 
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We claim: 

1. An isolated nucleic acid molecule 
comprising a gene located on Ara^bidopsis thalia.na 
chromosome 5, lower arm, said gene occupying a segment of 
said chromosome 5, lower arm, flanked on the centromeric 
side within 20 kilobases by a gene encoding a zinc-finger 
protein and on the telomeric side within 1 kilobase by a 
gene encoding a glutamic acid tRNA, the disruption of 
said gene being associated with DNA hypomethylation . 



wherein said gene is composed of exons that form an open 
reading frame having a sequence that encodes a 
polypeptide about 750-850 amino acids in length. 

3 . A cDNA molecule comprising the exons of the 
nucleic acid molecule of claim 2 . 

4. The nucleic acid molecule of claim 2, 
wherein said open reading frame encodes an amino acid 
sequence substantially the same as SEQ ID NO : 2 . 

5. The nucleic acid molecule of claim 4, 
wherein said open reading frame encodes amino acid SEQ ID 
NO:2 . 

6. The nucleic acid molecule of claim 5, which 
comprises an open reading frame of SEQ ID N0:1. 

7. A recombinant DNA molecule, comprising a 
vector having an insert that includes the nucleic acid 
molecule of claim 1 . 



2 . 



The nucleic acid molecule of claim 1, 



8 . 



The recombinant DNA molecule of claim 7, 
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which is cosmid C38, ATCC Accession No. 207208. 

9. An oligonucleotide between about 10 and 100 
nucleotides in length, which specifically hybridizes with 

5 a portion of the nucleic acid molecule of claim 1. 

10. An isolated nucleic acid molecule which is 
a gene, the disruption of which is associated with DNA 
hypomethylat ion, having a sequence selected from the 

10 group consisting of: 

a) SEQ ID NO : 1 ; 

b) an allelic variant or natural mutant of 

SEQ ID NO: 1 ; 

c) a sequence hybridizing with part or 
15 all of SEQ ID NO : 1 or its complement and encoding a 

polypeptide substantially the same as part or all of a 
polypeptide encoded by SEQ ID NO : 1 ; 

d) a sequence encoding part or all of a 
polypeptide having amino acid SEQ ID NO : 2 ; and 

20 e) a sequence encoding part or all of a 

polypeptide contained in the cosmid clone C3 8, designated 
ATCC Accession No. 207208. 



11. A polypeptide produced by expression of an 
25 isolated nucleic acid molecule comprising part or all of 
an open reading frame of a gene located on Arabidopsis 
thaliana chromosome 5, lower arm, said gene occupying a 
segment of said chromosome 5, lower arm, flanked on the 
centromeric side within 20 kilobases by a gene encoding a 
30 zinc-finger protein and on the telomeric side within 1 
kilobase by a gene encoding a glutamic acid tRNA, the 
disruption of said gene being associated with DNA 
hypomethylation . 

35 12. The polypeptide of claim 11, produced by 
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expression of a sequence selected from the group 
consisting of: 

a) SEQ ID NO: 1 ; 

b) an allelic variant or natural mutant of 

5 SEQ ID NO: 1 ; 

c) a sequence hybridizing with part or 
all of SEQ ID NO : 1 or its complement and encoding a 
polypeptide substantially the same as part or all of a 
polypeptide encoded by SEQ ID NO : 1 ; 

10 d) a sequence encoding part or all of a 

polypeptide having amino acid SEQ ID NO: 2; and 

e) a sequence encoding part or all of a 
polypeptide contained in the clone designated ATCC 
Accession No. 207208. 

15 

13. The polypeptide of claim 11, having the 
amino acid sequence of part or all of SEQ ID NO: 2. 

14. An antibody immunologically specific for 
20 the polypeptide of claim 11. 

15. An isolated nucleic acid molecule having a 
sequence substantially the same as SEQ ID NO : 3 . 

25 16. An isolated protein encoded by an 

Arahidopsis thaliana gene, said protein being a member of 
an SWI2/SNF2 family of polypeptides, loss of function of 
said protein being associated with DNA hypomethylat ion . 

30 17. The protein of claim 16, encoded by a gene 

located on A. thaliana chromosome 5, lower arm, 
centromerically flanked within 20 kilobases by a zinc- 
finger-encoding gene and telomerically within one 
kilobase by a gene encoding a glutamic acid tRNA. 
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18. The protein of claim 16, encoded by a DNA 
segment on a recombinant cosmid C3 8, having ATCC 
Accession No. 207208. 



19. The protein of claim 16, having amino acid 
SEQ ID NO:2 . 

20. A transgenic organism comprising the 
nucleic acid molecule of claim 1. 

21. The transgenic organism of claim 20, which 
is a plant . 

22. A method of stabilizing fidelity of DNA 
methylation in an organism, comprising transforming the 
organism with the nucleic acid molecule of claim 1. 

23. A method of reducing or eliminating gene 
silencing in a plant, comprising inhibiting or preventing 
expression of an endogenous DDMl gene of the plant. 

24 . A method of introducing inbreeding 
depression in a plant, comprising inhibiting or 
preventing expression of an endogenous DDAfI gene of the 
plant . 
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PLANT GENE THAT REGULATES DNA METHYLATION 

Pursuant to 35 U.S.C. §202 (c) , it is 
acknowledged that the U.S. Government has certain rights 
in the invention described herein, which was made in part 
with funds from the National Science Foundation, Grant 
Nos. MCB9306266 and BIR9256779. 

This application claims priority to U.S. 

Provisional Application Serial No. 60/ , filed 

April 30, 1998, and to U.S. Application No.' 09/104,070, 
filed June 24, 1998 the entireties of which are 
incorporated by reference herein. 

FIELD OF THE INVENTION 

This invention relates to the field of plant 
molecular biology, genetic engineering and regulation of 
gene expression. In particular, this invention provides 
a novel gene, DDMl , which plays an important role in the 
regulation of DNA methylation, and resultant regulation 
of gene expression, in plant genomic DNA. 

BACKGROUND OF THE INVENTION 

Various publications or patents are cited in 
this application to describe the state of the art to 
which the invention pertains. Each of these publications 
or patents is incorporated by reference herein. 

Plant genomes contain s\ibstantial amounts of 5- 
methylcytosine . Up to 2 0-30% of the cytosines are 
methylated in the nuclear genome of many flowering 
plants. As in other organisms, methylation of cytosine 
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residues in plants occurs post-replicatively through the 
action of cytosine-DNA methyltransf erases . Plant DNA 
methyltransf erases have been characterized biochemically, 
and plant genes encoding these enzymes have been isolated 
5 by virtue of their similarity to their mammalian 
counterparts . 

Investigations of native plant genes and 
transgenic plants containing foreign genes have found a 
general correlation between transcriptional inactivity 
10 and increased DNA methylation, consistent with evidence 
from mammalian systems. This evidence supports a role 
for cytosine methylation in maintaining transcriptional 
states. 

The plant's need for developmental plasticity 
15 and environmental interaction suggests that plants 

extensively employ epigenetic regulatory strategies. 
Such strategies rely on heritable, often reversible, 
changes in access to the underlying genetic information, 
but not alteration of the primary nucleotide sequence. 

2 0 As one example, the alteration of DNA methylation is 

expected to perturb plant development significantly, 
provided that differential DNA methylation is an 
important component of epigenetic regulation in plants. 
One paradigm linking DNA methylation and 
25 developmental regulation comes from work on the mouse, 
where average genome cytosine methylation levels in 
embryonic lineages drop sharply in the early cleavages 
following fertilization, then rise again around the time 
of implantation. In plants, a similar pattern has been 

3 0 observed in studies of DNA methylation content in pollen 

and post -embryonic tissue of varying age. Information 
from such studies indicates that there is a gradual rise 
in 5-methylcytosine levels in post -embryonic tissues 
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produced by meristems at positions further from the base 
of the plant (i.e., tissues of increasing age) . Genetic 
studies of transposon systems in maize also demonstrate 
an age-dependent gradient of increasing epigenetic 
modification, which is correlated with DNA methylation. 

Both biochemical and genetic approaches have 
been taken to alter DNA methylation in. eucaryotic 
organisms. Methylation inhibitor treatments have induced 
developmental abnormalities in many plant species. 
Transgenic plants expressing antisense molecules specific 
for a native cytosine methyltransf erase gene have been 
found to exhibit genomic hypomethylation, presumably due 
to the antisense interference with expression of the 
gene . 

In another approach, mutants of Ara^bidopsis 
thalia^na have been isolated, which show a decrease in DNA 
methylation (ddm) resulting in reduced nuclear 5- 
methylcytosine levels. The best characterized mutations 
define the DDMl gene. Homozygotes carrying recessive 
ddml alleles contain 30% of the wild-type levels of 5- 
methylcytosine . The ddml mutations do not map to the two 
known cytosine-DNA methyltransf erase genes of A. 
thaliana, nor do they affect DNA methyltransf erase 
activity detectable in nuclear extracts (Kakutani et al , , 
Nuc. Acids i?es. 23: 130-137, 1995). In addition, ddml 
mutations do not appear to affect the metabolism of the 
active methyl group donor, S-adenosylmethionine (Kakutani 
et al . , 19 95, supra). 

For the foregoing reasons, the DDMl gene 
product is likely to be a novel component of the DNA 
methylation system, or involved in determining the 
cellular context (e.g., chromatin structure, subnuclear 
localization) of the methylation reaction. Consequently, 
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it would be a clear advance in the art of plann molecular 
and cellular biology to identify and isolate the DDMl 
gene and/or its encoded protein. Such a gene and protein 
would find utility for the purpose of modifying the 
5 methylation status of a selected genome and thereby 
altering one or more regulatory features of gene 
expression from that genome. 



SUMMARY OF THE INVENTION 

10 A novel gene, DDMl, and its encoded protein are 

provided in accordance with the present invention. The 
gene has been identified as a novel element of the DNA 
methylation system . 

In one aspect of the invention, an isolated 

15 nucleic acid molecule comprising a gene located on 
Arabidopsis thaliana chromosome 5, lower arm, is 
provided. The gene occupies a segment of chromosome 5, 
lower arm, which is flanked on the centromeric side 
within 20 kilobases by a gene encoding a zinc-finger 

20 protein and on the telomeric side within 1 kilobase by a 
gene encoding a glutamic acid tRNA. Disruption of the 
gene is associated with DNA hypomethylat ion . The gene 
encodes a polypeptide of about 764 amino acids in length. 
The nucleotide sequence of the DDMl gene is set forth 

2 5 herein as SEQ ID NO : 1 and its deduced amino acid sequence 
as SEQ ID NO: 2. In SEQ ID NO : 1 , the regions of the gene 
that comprise coding sequence are indicated. 

In another aspect of the invention, an isolated 
DDMl gene is provided, having a sequence selected from 

30 the group consisting of: (a) SEQ ID NO:l; (b) an allelic 
variant or natural mutant of SEQ ID NO : 1 ; (c) a sequence 
hybridizing with part or all of SEQ ID NO : 1 or its 
complement and encoding a polypeptide substantially the 
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same as part or all of a polypeptide encoded by SEQ ID 
NO : 1 ; (d) a sequence encoding part or all of a 
polypeptide having amino acid SEQ ID NO : 2 ; and (e) a 
sequence encoding part or all of a polypeptide contained 
5 in the cosmid clone C38, designated ATCC Accession No. 
207208 . 

According to another aspect of the invention, a '\ 
polypeptide is provided, which is produced by expression 
of an isolated nucleic acid molecule comprising part or 

10 all of an open reading frame of a gene located on 

Arabidopsis thaliana chromosome 5, lower arm, the gene 
occupying a segment of chromosome 5, lower arm, flanked 
on the centromeric side within 20 kilobases by a gene 
encoding a zinc-finger protein and on the telomeric side 

15 within 1 kilobase by a gene encoding a glutamic acid 
tRNA. This polypeptide preferably has the amino acid 
sequence of part or all of SEQ ID NO : 2 . 

According to another aspect of the invention, 
an isolated protein encoded by an Arahidopsis thaliana 

2 0 gene is provided, which is a member of an SWI2/SNF2 

family of polypeptides. Loss of function of the protein 
is associated with DNA hypomethylat ion . The protein is 
encoded by a gene located on A. thaliana chromosome 5, 
lower arm, centromerically flanked within 20 kilobases by 
25 a zinc finger-encoding gene and telomerically within one 
kilobase by a gene encoding a glutamic acid tRNA. 

According to another aspect of the invention, a 
transgenic organism comprising the DDMl gene is provided. 
In one embodiment, the transgenic organism is a plant. 

3 0 In other aspects of the invention, methods are 

provided for stabilizing fidelity of DNA methylation in 
an organism, which comprise transforming the organism 
with the DDMl gene. Methods are also provided for 
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reducing or eliminating gene silencing in a plant, or for 
inducing inbreeding depression in a plant, which comprise 
inhibiting or preventing expression of an endogenous DDMl 
gene of the plant . 

These aspects of the invention, as well as 
other features and advantages of the invention, will be 
described in greater detail in the description and 
examples set forth below. 



10 BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1. Map-based isolation of the A. 
thallana DDMl gene. A genetic map of the region of A. 
thaliana chromosome 5 containing the DDMl gene is shown 
at the top of the figure (see Example 1) . The relative 

15 sizes of the genetic intervals were determined by the 

number of recombination breakpoints (rec bkpts) scored in 
a panel of recombinant lines containing cross-overs 
between flanking markers yi and aha. The regions 
represented in genomic clones T10D21 and C38 are denoted 

2 0 by the open boxes below the genetic map. The -3 0 kb 

interval containing the DDMl gene, defined by the genetic 
markers A and D, is shown at the bottom of the figure. 
The number of recombination breakpoints scored between 
markers A - D and ddml-2 are indicated. The position of 

2 5 predicted coding regions in the interval are numbered and 
shown below the physical map. BAG, bacterial artificial 
chromosome; SuDH, succinate dehydrogenase structural 
gene . 

Figure 2. DDMl gene structure and 
30 identification. Fig, 2A: The intron/exon structure of 
the DDMl gene. Protein-coding exons are shown as open 
boxes, with the start and stop codons indicated. Introns 
are depicted as thin lines. The position and nature of 
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four ddml alleles are indicated above the exon/intron 
map. Fig. 2B: RT-PCR analysis of ddwl-2 and wild-type 
DDMl transcripts. The approximate positions of 
oligonucleotide primers used in the analysis are shown 
below the map in Fig. 2A. Amplifications were done on 
either genomic templates (DNA) , first-strand cDNA 
templates (+RT, plus reverse transcriptase), or mock- 
synthesized CDNA (-RT, minus reverse transcriptase) . 
Amplified products were separated on a 3% agarose gel and 
visualized after ethidium bromide staining. 

Amplification from cDNA representing the properly spliced 
transcript resulted in a -280 bp product. The nucleotide 
sequence of the -220 bp product amplified from dcbnl-2 
CDNA template indicated that the mutation leads to use of 
an alternate splice donor 56 bp upstream of the wild-type 
splice donor site. 

Figure 3. The A. thaliana DDMl gene encodes a 
SWI2/SNF2-like protein. The deduced primary amino acid 
sequence of DDMl (At DDMl) is aligned with two other 
SWI2/SNF2-like protein sequences, Mus wusculus lymphocyte 
specific helicase (Mm LSH; SEQ ID NO:4) and human SNF2h 
(Hs SNF2h; SEQ ID N0:5). Sequence identities are 
indicated by black boxes and conservative changes are 
shaded. The positions of the eight signature motifs 
characteristic of SNF2 family proteins are indicated 
below the aligned sequences. Amino acid coordinates are 
indicated on the left; only the N- terminal 73 0 amino 
acids (of 1052 total) are shown for human SNF2h, though 
SEQ ID NO: 5 shows the entire protein sequence. The 
deletion/frameshift caused by the ddinl-2 allele occurs at 
amino acid 524. The ddml-S frameshift occurs at amino 
acid 379, leading to translation of an additional 52 
amino acids out of frame. The ddn,l-7 nonsense mutation 
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occurs at amino acid 549. Dashes indicate gaps 
introduced by the CLUSTAL W algorithm to maximize 
alignment (Thompson et al . , Nucleic Acids Res. 22: 4673- 
4680, 1994) . The alignment was processed by BOXSHADE v. 
5 3.21. 



DETAILED DESCRIPTION OF THE INVENTION 
I • Definitions 

Various terms relating to the biological 
molecules of the present invention are used throughout 
the specification and claims. 

With reference to nucleic acids of the 
invention, the term "isolated nucleic acid" is sometimes 
used. This term, when applied to DNA, refers to a DNA 
molecule that is separated from sequences with which it 
is immediately contiguous (in the 5' and 3' directions) 
in the naturally occurring genome of the organism from 
which it was derived. For example, the "isolated nucleic 
acid" may comprise a DNA molecule inserted into a vector, 
such as a plasmid or virus vector, or integrated into the 
genomic DNA of a procaryote or eucaryote . An "isolated 
nucleic acid molecule" may also comprise a cDNA molecule. 

With respect to RNA molecules of the invention 
the term "isolated nucleic acid" primarily refers to an 
RNA molecule encoded by an isolated DNA molecule as 
defined above. Alternatively, the term may refer to an 
RNA molecule that has been sufficiently separated from 
RNA molecules with which it would be associated in its 
natural state (i.e., in cells or tissues), such that it 
exists in a "substantially pure" form (the term 
"substantially pure" is defined below) . 

With respect to protein, the term "isolated 
protein" or "isolated and purified protein" is sometimes 
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used herein. This term refers primarily to a protein 
produced by expression of an isolated nucleic acid 
molecule of the invention. Alternatively, this term may 
refer to a protein which has been sufficiently separated 
from other proteins with which it would naturally be 
associated, so as to exist in "substantially pure" form. 

The term "substantially pure" refers to a 
preparation comprising at least 50-60% by weight the 
compound of interest (e.g., nucleic acid, 
oligonucleotide, protein, etc.). More preferably, the 
preparation comprises at least 75% by weight, and most 
preferably 90-99% by weight, the compound of interest. 
Purity is measured by methods appropriate for the 
compound of interest (e.g. chromatographic methods, 
agarose or polyacrylamide gel electrophoresis, HPLC 
analysis, and the like) . 

Nucleic acid sequences and amino acid sequences 
can be compared using computer programs that align the 
similar sequences of the nucleic or amino acids thus 
define the differences. In the comparisons made in the 
present invention, the CLUSTLW program and parameters 
employed therein were utilized (Thompson et al . , 1994, 
supra) . However, equivalent alignments and 
similarity/identity assessments can be obtained through 
the use of any standard alignment software. For 
instance, the GCG Wisconsin Package version 9.1, 
available from the Genetics Computer Group in Madison, 
Wisconsin, and the default parameters used (gap creation 
penal ty=12, gap extension penalty=4) by that program may 
also be used to compare sequence identity and similarity. 

The term "substantially the same" refers to 
nucleic acid or amino acid sequences having sequence 
variation that do not materially affect the nature of the 
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protein (i.e. the structure, stability characteristics, 
substrate specificity and/or biological activity of the 
protein) . With particular reference to nucleic acid 
sequences, the term "substantially the same" is intended 
5 to refer to the coding region and to conserved sequences 
governing expression, and refers primarily to degenerate 
codons encoding the same amino acid, or alternate codons 
encoding conservative substitute amino acids in the 
encoded polypeptide. With reference to amino acid 

10 sequences, the term "substantially the same" refers 

generally to conservative substitutions and/or variations 
in regions of the polypeptide not involved in 
determination of structure or function. 

The terms "percent identical" and "percent 

15 similar" are also used herein in comparisons among amino 
acid and nucleic acid sequences. When referring to amino 
acid sequences, ''percent identical" refers to the percent 
of the amino acids of the subject amino acid sequence 
that have been matched to identical amino acids in the 

2 0 compared amino acid sequence by a sequence analysis 

program. "Percent similar" refers to the percent of the 
amino acids of the subject amino acid sequence that have 
been matched to identical or conserved amino acids. 
Conserved amino acids are those which differ in structure 

2 5 but are similar in physical properties such that the 

exchange of one for another would not appreciably change 
the tertiary structure of the resulting protein. 
Conservative substitutions are defined in Taylor (1986, 
J. Theor. Biol. 119:205). When referring to nucleic acid 

3 0 molecules, ^'percent identical" refers to the percent of 

the nucleotides of the subject nucleic acid sequence that 
have been matched to identical nucleotides by a sequence 
analysis program. 
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With respect to antibodies, the term 
"immunologically specific" refers to antibodies that bind 
to one or more epitopes of a protein of interest, but 
which do not substantially recognize and bind other 
5 molecules in a sample containing a mixed population of 
antigenic biological molecules. 

With respect to oligonucleotides or other 
single-stranded nucleic acid molecules, the term 
"specifically hybridizing" refers to the association 

10 between two single- stranded nucleic acid molecules of 
sufficiently complementary sequence to permit such 
hybridization under pre -determined conditions generally 
used in the art (sometimes termed "substantially 
complementary") . In particular, the term refers to 

15 hybridization of an oligonucleotide with a substantially 
complementary sequence contained within a single- stranded 
DNA or RNA molecule, to the substantial exclusion of 
hybridization of the oligonucleotide with single- stranded 
nucleic acids of non- complementary sequence. 

20 A ^'coding sequence" or ''coding region" refers 

to a nucleic acid molecule having sequence information 
necessary to produce a gene product, when the sequence is 
expressed . 

The term "operably linked" or "operably 
25 inserted" means that the regulatory sequences necessary 
for expression of the coding sequence are placed in a 
nucleic acid molecule in the appropriate positions 
relative to the coding sequence so as to enable 
expression of the coding sequence. This same definition 
3 0 is sometimes applied to the arrangement other 

transcription control elements (e.g. enhancers) in an 
expression vector. 

Transcriptional and translational control 
sequences are DNA regulatory sequences, such as 
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promoters, enhancers, polyadenylat ion signals, 
cerminators , and the like, that provide for the 
expression of a coding sequence in a host cell. In 
particular, as used herein, the term "DNA transcriptional 
response element" refers to a DNA sequence specifically 
recognized for binding by a DNA binding protein 
characterized as a transcriptional regulator (either 
activator or suppressor) . 

The terms "promoter" , "promoter region" or 
''promoter sequence" refer generally to transcriptional 
regulatory regions of a gene, which may be found at the 
5' or 3 ' side of the coding region, or within the coding 
region, or within introns. Typically, a promoter is a 
DNA regulatory region capable of binding RNA polymerase 
in a cell and initiating transcription of a downstream 
(3' direction) coding sequence. The typical 5* promoter 
sequence is bounded at its 3 ' terminus by the 
transcription initiation site and extends upstream (5* 
direction) to include the minimum number of bases or 
elements necessary to initiate transcription at levels 
detectable above background. VJithin the promoter 
sequence is a transcription initiation site (conveniently 
defined by mapping with nuclease SI) , as well as protein 
binding domains (consensus sequences) responsible for the 
binding of RNA polymerase. 

A "vector" is a replicon, such as plasmid, 
phage, cosmid, or virus to which another nucleic acid 
segment may be operably inserted so as to bring about the 
replication or expression of the segment. 

The term "nucleic acid construct" or "DNA 
construct" is sometimes used to refer to a coding 
sequence or sequences operably linked to appropriate 
regulatory sequences and inserted into a vector for 
transforming a cell. This term may be used 
interchangeably with the term "transforming DNA" . Such a 
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nucleic acid construct may contain a coding sequence for 
a gene product of interest, along with a selectable 
marker gene and/or a reporter gene. 

The term "reporter gene" refers to genetic 
sequences which may be operably linked to a promoter 
region forming a transgene, such that expression of. the 
reporter gene coding region is regulated by the promoter 
and expression of the transgene is readily assayed. 

The term "selectable marker gene" refers to a 
gene product that when expressed confers a selectable 
phenotype, such as antibiotic resistance, on a 
transformed cell or plant. 

The term "DNA construct" is sometimes used 
herein to refer to genetic sequence used to transform 
plants and generate progeny transgenic plants. These 
constructs may be administered to plants in a viral or 
plasmid vector. Other methods of delivery such as 
Agrobacterium T-DNA mediated transformation and 
transformation using the biolistic process are also 
contemplated to be within the scope of the present 
invention. The transforming DNA may be prepared 
according to standard protocols such as those set forth 
in "Current Protocols in Molecular Biology", eds . 
Frederick M. Ausubel et al . , John Wiley & Sons, 1995. 

A cell has been "transformed" or " transf ected" 
by exogenous or heterologous DNA construct when such DNA 
has been introduced inside the cell. The transforming 
DNA may or may not be integrated (covalently linked) into 
the genome of the cell, in prokaryotes, yeast, and plant 
cells for example, the transforming DNA may be maintained 
on an episomal element such as a plasmid. With respect to 
eukaryotic cells, a stably transformed cell is one in 
which the transforming DNA has become integrated into a 
chromosome so that it is inherited by daughter cells 
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through chromosome replication. This stability is 
demonstrated by the ability of the eukaryotic cell to 
establish cell lines or clones comprised of a population 
of daughter cells containing the transforming DNA. A 
"clone" is a population of cells derived from a single 
cell or common ancestor by mitosis. A "cell line" is a 
clone of a primary cell that is capable of stable growth 
in vitro for many generations. 



II. Description of DDMl 

and its Encoded Polveptide 

In accordance with the present invention, a 

15 novel gene, DDMl, has been isolated from the flowering 
plant Arabidopsis thaliana. Through analysis of mutant 
plants, this gene has been identified as important for 
the maintenance of proper genomic cytosine methylation, 
and its function appears to be necessary to maintain gene 

20 silencing. Biochemical and molecular genetic results 

indicate that DDMl encodes a novel component of the DNA 
methylation machinery . 

We have isolated the DDMl gene from A. thaliana. 
using a map-based cloning approach, which is described in 

25 detail in Example 1 and shown in Figure 1. Briefly, the 
DDMl gene was initially localized to the bottom of the 
lower arm of chromosome 5 by reference to molecular 
markers segregating in an F2 family (parental cross: 
Columbia ddml/ddml X Landsberg erecta DDMl/DDMl) . Next, 

3 0 recombination breakpoints in the region surrounding a 
ddml mutation were isolated by collecting cross-over 
chromosomes by reference to flanking genetic markers. 
The recombination breakpoints delimited a region of 
approximately 3 0 kilobases. Cloned DNA corresponding to 

35 this genomic region was isolated by subcloning DNA from a 
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bacterial artificial chromosome (BAC) containing 
molecular markers mapping both proximal and distal to the 
ddwl marker. The nucleotide sequence of a single cosmid 
subclone encompassing the 3 0 kb region was determined to 
identify six candidate genes, in addition to a tRNA gene 
and a previously identified succinate dehydrogenase 
structural gene . 

The search for the DDMl gene focused on 
predicted genes 5 and 6, which fell in the center of the 
genetic interval defined by recombination breakpoints 
with the ddml-2 marker. The DDMl gene was identified as 
predicted gene 6 based on DNA sequence alterations in 
four ddml alleles (Figure 2) . The EMS -generated ddinl-2 
mutation is a G to A transition in the splice donor site 
of intron 11 that forces the use of an alternate splice 
donor site 56 bp upstream in exon 11 (Fig. 2B) . The 
splicing defect leads to a deletion, a frameshift and 
premature translation termination upstream of predicted 
functional domains. The fast neutron-generated ddml-S 
(previously named somS; Mittelsten Scheid, O., Afsar, K. 
Sc. Paszkowski, J. Proc. Natl. Acad, Sci . USA 95: 632-637, 
1998).) allele contains an 82 bp insertion (1 bp deleted 
and replaced with 83 bp) in the second protein-coding 
exon, leading to an in- frame stop after 3 0 codons (15 
wild-type codons plus 15 codons from the insertion) . 
Premature translation termination is also predicted to 
result from two additional fast neutron alleles: ddml-6 
{som4) corresponds to a frameshift (1 bp deletion) in 
exon 7 and ddml-l {som5) is a nonsense mutation in exon 
12. All four characterized ddml alleles are expected to 
destroy or severely reduce gene function. 

The wild-type DDMl gene encodes a predicted 
protein of 764 amino acids with a high degree of 
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similarity to SWI2/SNF2-lik:e proteins. Members of the 
SWI2/SNF2 family are involved in various functions, 
including transcriptional co- activation, transcriptional 
co-repression, chromatin assembly and DNA repair. 
5 Underlying these apparently diverse activities is the 

modification or disruption of protein-DNA interactions by 
multi-protein complexes which contain SWI2/SNF2 - like 
components. Figure 3 shows an alignment among the 
deduced amino acid sequences of A, thaliana. DDMl and two 

10 mammalian members of the SNF2 family, human SNF2h (SEQ ID 
NO:4; Arihara, T. et al., Cytogenet. Cell Genet. 81, 
191-193, 1998) and murine LSH (SEQ ID NO : 5 ; lymphocyte 
specific helicase, LSH; Jarvis, CD. et al . Gene 169, 
203-207, 1996) . DDMl contains the eight sequence motifs 

15 diagnostic of SWI2/SNF2 family members (Bork, P. & 

Koonin, E.V. Nucleic Acids Res. 21, 751-752, 1993) . A. 
thaliana DDMl and human SNF2h share 45 percent identity, 
over the approximately 470 amino acid region comprising 
the signature motifs. Over a similar region, A. thaliana 

2 0 DDMl and murine LSH display approximately 50 percent 

identity, omitting the 47 residues (amino acids 276-322) 
apparently unique to LSH. Initial molecular phylogenetic 
analysis placed DDMl in a small subfamily, within the 
SNF2 family, which contains proteins of unknown function, 
25 including murine LSH (Eisen, J. A. et al . Nucleic Acids 

Res. 23, 2715-2723, 1995), The proteins of known function 
most closely related to DDMl are involved in chromatin 
remodeling and are grouped in the SNF2L/ISWI subfamily 
(Eisen et al . , 1995, supra), 

3 0 Without intending to be bound by any particular 

mechanism for the functionality of the DDMl gene product, 
analysis of the foregoing data indicates that the DDMl 
protein functions in the DNA methylation system by 
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affecting chromatin structure. Two general models for 
the DDMl action are envisioned. The DDMl protein may 
function as a transcriptional co-activator, similar to 
many SWI2/SNF2-like proteins, to increase the expression 
5 of a component of the DNA methylation system. DDMl does 
not affect DNA methyltransf erase expression directly 
because ddml mutant extracts contain wild-type 
methyltransf erase activity (Kakutani et al . , 1995, 
supra) . However, an unidentified positive effector of 

10 DNA methylation may be a target. Alternatively, wild-type 
DDMl function may change chromatin structure to direct 
certain sequences to the methylation machinery or to 
facilitate the methylation of genomic substrates. The 
recently discovered interplay between cytosine 

15 methylation and histone acetylation , and the association 
of SWI2/SNF2 -like proteins and histone deacetylases in 
chromatin remodeling complexes, makes it plausible that 
DDMl affects DNA methylation through modulation of 
histone modification or another aspect of chromatin 

20 structure. Another possibility is that DDMl plays a more 
direct role as a part of a nucleosome remodeling complex 
that increases the accessibility of the DNA 
methyltransf erase to the hemimethylated substrates in 
newly replicated chromatin. The latter model is 

25 particularly attractive because it predicts that ddml 
mutations will preferentially hypomethylate genomic 
sequences packaged in highly condensed chromatin while 
causing slower loss of methylation in more accessible 
sequences, consistent with the observed hypomethylation 

30 specificity of ddml mutations. The isolation of the 
Arabidopsis DDMl gene in accordance with the present 
invention points to the importance of chromatin dynamics 
in the maintenance of cytosine methylation patterns and 
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identifies a novel component of the eukaryotic DNA 
methylation pathway. 

A number of applications are contemplated for 
the novel gene of the invention and its encoded protein, 
and the discovery of the involvement of a SIVX2/SJVF2-like 
gene in the eucaryotic DNA methylation system. Such 
applications are described in greater detail below. 

Although the DDMl genomic clone from 
Arahidopsis thaliana is described and exemplified herein, 
this invention is intended to encompass nucleic acid 
sequences and proteins from other organisms, including 
plants, yeast, insects and mammals, that are sufficiently 
similar to be used instead of the Arahidopsis DDMl 
nucleic acid and proteins for the purposes described 
below. These include, but are not limited to, allelic 
variants and natural mutants of SEQ ID NO : 1 , which are 
likely to be found in different species of plants or 
varieties of Arahidopsis . Because such variants are 
expected to possess certain differences in nucleotide and 
amino acid sequence, this invention provides an isolated 
DDMl nucleic acid molecule having at least about 60% 
(preferably 70% and more preferably over 80%) sequence 
homology in the coding regions with the nucleotide 
sequence set forth as SEQ ID NO:l (and, most preferably, 
specifically comprising the coding region of SEQ ID 
NOrl). This invention also provides isolated polypeptide 
products of the open reading frames of SEQ ID N0:1, 
having at least about 60% (preferably 70% or 80% or 
greater) sequence homology with the amino acid sequences 
of SEQ ID NO: 2, Because of the natural sequence 
variation likely to exist among DDMl genes, one skilled 
in the art would expect to find up to about 30-40% 
nucleotide sequence variation, while still maintaining 
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the unique properties of the DDMl gene and encoded 
polypeptide of the present invention. Such an 
expectation is due in part to the degeneracy of the 
genetic code, as well as to the known evolutionary 
success of conservative amino acid sequence variations, 
which do not appreciably alter the nature of the encoded 
protein. Accordingly, such variants are considered 
substantially the same as one another and are included 
within the scope of the present invention. 

The following description sets forth the 
general procedures involved in practicing the present 
invention. To the extent that specific materials are 
mentioned, it is merely for purposes of illustration and 
is not intended to limit the invention. Unless otherwise 
15 specified, general cloning procedures, such as those set 
forth in Sambrook et al . , Molecular Cloning . Cold Spring 
Harbor Laboratory (1989) (hereinafter "Sambrook et al . ") 
or Ausubel et al . (eds) Current Protocols in Molecular 
Bioloqy, John Wiley 8c Sons (1999) (hereinafter "Ausubel 
20 et al.") are used. 



10 



A. Preparation of DDMl Nucleic Acid 

Molecules, encoded Polypeptides and 
Antibo dies Specific for the Polypeptides 

1 . Nucleic Acid Molecules 

DDMl nucleic acid molecules of the invention 
may be prepared by two general methods: (1) they may be 
synthesized from appropriate nucleotide triphosphates, or 
(2) they may be isolated from biological sources. Both 
methods utilize protocols well known in the art. 

The availability of nucleotide sequence 
information, such as the cDNA having SEQ ID NO:l, -enables 
preparation of an isolated nucleic acid molecule of the 
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invention by oligonucleotide synthesis. Synthetic 
oligonucleotides may be prepared by the phosphoramadite 
method employed in the Applied Biosystems 3 8A DNA 
Synthesizer or similar devices. The resultant construct 
5 may be purified according to methods known in the art, 
such as high performance liquid chromatography (HPLC) . 
Long, double-stranded polynucleotides, such as a DNA 
molecule of the present invention, must be synthesized in 
stages, due to the size limitations inherent in current 

10 oligonucleotide synthetic methods. Thus, for example, a 
long double-stranded molecule may be synthesized as 
several smaller segments of appropriate complementarity. 
Complementary segments thus produced may be annealed such 
that each segment possesses appropriate cohesive termini 

15 for attachment of an adjacent segment. Adjacent segments 
may be ligated by annealing cohesive termini in the 
presence of DNA ligase to construct an entire long 
double -stranded molecule. A synthetic DNA molecule so 
constructed may then be cloned and amplified in an 

20 appropriate vector . 

DDMl genes also may be isolated from 
appropriate biological sources using methods known in the 
art. In the exemplary embodiment of the invention, the 
A. thaliana DDMl clone was isolated from a BAG genomic 

25 library of A. thaliana In alternative embodiments, cDNA 
clones of DDMl may be isolated. A preferred means for 
isolating DDMl genes is PGR amplification using genomic 
templates and DD^TI-specif ic primers. 

In accordance with the present invention, 

3 0 nucleic acids having the appropriate level sequence 

homology with part or all the coding regions of SEQ ID 
N0:1 may be identified by using hybridization and washing 
conditions of appropriate stringency. For example. 
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hybridizations may be performed, according to the method 
of Sambrook et al . , using a hybridization solution 
comprising: 5X SSC, 5X Denhardt ' s reagent, 1.0% SDS, 100 
/ig/ml denatured; fragmented salmon sperm DNA, 0.05% 
5 sodium pyrophosphate and up to 50% formamide. 

Hybridization is carried out at 37-42oc for at least six 
hours. Following hybridization, filters are washed as 
follows: (1) 5 minutes at room temperature in 2X SSC and 
1% SDS; (2) 15 minutes at room temperature in 2X SSC and 

10 0.1% SDS; (3) 30 minutes-1 hour at 37oC in 2X SSC and 
0,1% SDS; (4) 2 hours at 45-55oin 2X SSC and 0.1% SDS, 
changing the solution every 30 minutes. 

One common formula for calculating the 
stringency conditions required to achieve hybridization 

15 between nucleic acid molecules of a specified sequence 
homology (Sambrook et al . , 1989): 

= 81 .5°C + 16.6Log [Na+] + 0.41(% G+C) - 0.63 (% formamide) - 600/#bp in duplex 

20 As an illustration of the above formula, using [N+] = 

[0.368] and 50% formamide, with GC content of 42% and an 
average probe size of 200 bases, the T„ is Sl^'C. The 
of a DNA duplex decreases by 1 - 1.5°C with every 1% 
decrease in homology. Thus, targets with greater than 

25 about 75% sequence identity would be observed using a 

hybridization temperature of 42°C. Such a sequence would 
be considered substantially homologous to the sequences 
of the present invention. 

Nucleic acids of the present invention may be 

30 maintained as DNA in any convenient cloning vector. In a 
preferred embodiment, clones are maintained in plasmid 
cloning/expression vector, such as pGEM-T (Promega 
Biotech, Madison, WI) or pBluescript (Stratagene, La 
Jolla, CA) , either of which is propagated in a suitable 
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E. coll host cell. 

DDMl nucleic acid molecules of the invention 
include cDNA, genomic DNA, RNA, and fragments thereof 
which may be single- or double -stranded. Thus, this 
5 invention provides oligonucleotides (sense or antisense 
strands of DNA or RNA) having sequences capable of 
hybridizing with at least one sequence of a nucleic acid 
molecule of the present invention, such as selected 
segments of the DNA having SEQ ID NO : 1 . Such 
10 oligonucleotides are useful as probes for detecting JDDMI 
genes or mRNA in test samples, e.g. by PGR amplification,, 
or for the positive or negative regulation of expression 
of DDMl genes at or before translation of the mRNA into 
proteins . 

15 The DDMl promoter and other expression 

regulatory sequences for DDMl are also expected to be 
useful in connection with the present invention. SEQ ID 
NO:l shows about 550 bp of sequence upstream from the 
beginning of the coding region, which should contain such 

20 expression regulatory sequences. In addition, SEQ ID 
NO: 3 constitutes about 5 Jcbp of additional upstream 
sequence, which should contain other regulatory 
sequences, such as enhancer elements. 

25 2 o Proteins 

Polypeptides encoded by DDMl nucleic acids of 
the invention may be prepared in a variety of ways, 
according to known methods. If produced in situ the 
polypeptides may be purified from appropriate sources, 
30 e.g., plant parts. 

Alternatively, the availability of nucleic acid 
molecules encoding the polypeptides enables production of 
the proteins using in vitro expression methods known in 
the art. For example, a cDNA or gene may be cloned into 
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an appropriate in vitro transcription vector, such a 
pSP64 or pSP65 for in vitro transcription, followed by 
cell-free ^iranslacion in a suitable cell-free translation 
system, such as wheat germ or rabbit reticulocytes. In 
vitro transcription and translation systems are 
commercially available, e.g., from Promega Biotech, 
Madison, Wisconsin or BRL, Rockville, Maryland. 

According to a preferred embodiment, larger 
quantities of DDMI-encoded polypeptide may be produced by 
expression in a suitable procaryotic or eucaryotic 
system. For example, part or all of a DNA molecule, such 
as the coding portion of SEQ ID NO:l, may be inserted 
into a plasmid vector adapted for expression in a 
bacterial cell (such as E. coli) or a yeast cell (such as 
15 Saccharowyces cerevisiae) , or into a baculovirus vector 
for expression in an insect cell. Such vectors comprise 
the regulatory elements necessary for expression of the 
DNA in the host cell, positioned in such a manner as to 
permit expression of the DNA in the host cell. Such 
2 0 regulatory elements required for expression include 

promoter sequences, transcription initiation sequences 
and, optionally, enhancer sequences. : 

The DDMl polypeptide produced by gene 
expression in a recombinant procaryotic or eucyarotic 

2 5 system may be purified according to methods known in the 

art. In a preferred embodiment, a commercially available 
expression/secretion system can be used, whereby the 
recombinant protein is expressed and thereafter secreted 
from the host cell, to be easily purified from the 

3 0 surrounding medium. If expression/secretion vectors are 

not used, an alternative approach involves purifying the 
recombinant protein by affinity separation, such as by 
immunological interaction with antibodies that bind 
specifically to the recombinant protein. Such methods 
35 are commonly used by skilled practitioners. 



BNSOOCID: <WO 9955891 A1_lB> 



wo 99/5589 1 PCT/US99/09268 

- 24 - 

The DDMI-encoded polypeptides of the invention, 
prepared by the aforementioned methods, may be analyzed 
according to standard procedures. Methods for analyzing 
the functional activity are available. For instance, DNA 
5 methylation levels are detectable by known methods. 

Alternatively, the function of the DDMl gene product as 
part of a chromatin remodeling machine permits the use of 
in vitro assays for chromatin remodeling, which are known 
in the art (e.g., B.R. Cairns, Trends in Biochem. 23: 20- 
10 25, 1998) . 

The present invention also provides antibodies 
capable of immunospecif ically binding to polypeptides of 
the invention. Polyclonal or monoclonal antibodies 
directed toward the polypeptide encoded by DDMl may be 

15 prepared according to standard methods. Monoclonal 

antibodies may be prepared according to general methods 
of Kohler and Milstein, following standard protocols. In 
a preferred embodiment, antibodies are prepared, which 
react immunospecif ically with various epitopes of the 

2 0 DDMl -encoded polypeptides. 

B. Uses of DDMl Nucleic Acids, 

Encoded Proteins and Antibodies 

1 . DDMl Nucleic Acids 

2 5 DDMl nucleic acids may be used for a variety of 

purposes in accordance with the present invention. The 
DNA, RNA, or fragments thereof may be used as probes to 
detect the presence of and/or expression of DDMl genes. 
Methods in which DDMl nucleic acids may be utilized as 

3 0 probes for such assays include, but are not limited to: 

(1) in situ hybridization; (2) Southern hybridization (3) 
northern hybridization; and (4) assorted amplification 
reactions such as polymerase chain reactions (PGR) . 

The DDMl nucleic acids of the invention may 
35 also be utilized as probes to identify related genes from 
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other species, including but not limited to, plants, 
yeast, insects and mammals, including humans. As is well 
known in the art and described above, hybridization 
stringencies may be adjusted to allow hybridization of 
5 nucleic acid probes with complementary sequences of 

varying degrees of homology. Thus, DDMl nucleic acids 
may be used to advantage to identify and characterize 
other genes of varying degrees of relation to the 
exemplary coding sequence of SEQ ID NO:l, thereby 

10 enabling further characterization of this family of 

genes. Additionally, they may be used to identify genes 
encoding proteins that interact with protein encoded by 
DDMl (e.g., by the "interaction trap" technique). 

As discussed above and in greater detail in 

15 Example 1, the similarity among plant DDMl and its 

SWI2/SNF2 counterparts in yeast, Drosophila. and mammals 
indicates that the functional aspects of these proteins 
will also be conserved. Thus, DDMl is expected to play 
an important role in DNA methylation and resultant down- 

2 0 regulation of gene expression. Plants engineered to 
over-express DDMl can be expected to have improved 
fidelity of the DNA methylation system. The evidence 
suggests that loss of DDMl function leads to reduction in 
the efficiency of maintenance methylation due to reduced 

25 accessibility of the methyl transferase enzyme to the 

substrate. Hence, excess DDMl function could lead to an 
increase in the fidelity of the inheritance of DNA 
methylation thereby reducing the occurrence of spurious 
methylation mistakes which could compromise the 

30 organism's viability or fecundity. In fact, there are 
experimental data demonstrating that loss of DDMl 
function leads to stochastic hypermethylat ion, and 
epigenetic lesion formation, as well. For these reasons, 
DDMl overexpression lines are expected to have useful 

35 properties. 
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Transgenic plants expressing the DDMl gene or 
antisense nucleotides can be generated using standard 
plane transformation methods known to those skilled in 
the art. These include, but are not limited to, 
5 Agrobacterium vectors, PEG treatment of protoplasts, 

biolistic DNA delivery, UV laser microbeam, gemini virus 
vectors, calcium phosphate treatment of protoplasts, 
electroporation of isolated protoplasts, agitation of 
cell suspensions with microbeads coated with the 

10 transforming DNA, direct DNA uptake, 1 iposome -mediated 
DNA uptake, and the like. Such methods have been 
published in the art. See, e.g.. Methods for Plant 
Molecular BioloQy (Weissbach Sc Weissbach, eds . , 1988); 
Methods in Plant Molecular Biology (Schuler Sc Zielinski, 

15 eds., 1989); Plant Molecular BioloQv Manual (Gelvin, 

Schilperoort , Verma, eds., 1993); and Methods in Plant 
Molecular Biology - A Laboratory Manual (Maliga, Klessig, 
Cashmore, Gruissem Sc Varner, eds., 1994) . 

The method of transformation depends upon the 

2 0 plant to be transformed. The biolistic DNA delivery 

method is useful for nuclear transformation. In another 
embodiment of the invention, AgrroJbacterium vectors are 
used to advantage for efficient transformation of plant 
nuclei . 

2 5 In a preferred embodiment, the gene is 

introduced into plant nuclei in AgroJbacteriam binary 
vectors. Such vectors include, but are not limited to, 
BIN19 (Bevan, 1984) and derivatives thereof, the pBI 
vector series (Jefferson et al . , 1987), and binary 

30 vectors pGA482 and pGA492 (An, 1986) . 

The DDMl gene may be placed under a powerful 
constitutive promoter, such as the Cauliflower Mosaic 
Virus (CaMV) 35S promoter or the figwort mosaic virus 35S 
promoter. Transgenic plants expressing the DDMl gene 

35 under an inducible promoter (either its own promoter or a 
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heterologous promoter) are also contemplated to be within 
the scope of the present invention. Inducible plant 
promoters include the tetracycline represser/operator 
connrolled promoter . 
5 Using an Agrohacterlum binary vector system for 

transformation, the DDMl coding region, under control of 
a constitutive or inducible promoter as described above, 
is linked to a nuclear drug resistance marker, such as 
kanamycin resistance . Agrrobacterium-mediated 
10 transformation of plant nuclei is accomplished according 
to the following procedure: 

(1) the gene is inserted into the selected 
Agrohacterlum binary vector; 

(2) transformation is accomplished by co- 
15 cultivation of plant tissue (e.g., leaf discs) with a. 

suspension of recombinant Agrohacterium, followed by 
incubation (e.g., two days) on growth medium in the 
absence of the drug used as the selective medium (see, 
e.g., Horsch et al . 1985); 

20 (3) plant tissue is then transferred onto the 

selective medium to identify transformed tissue; and - 

(4) identified transf ormants are regenerated 
to intact plants. 

It should be recognized that the amount of 

25 expression, as well as the tissue specificity of 

expression of the DDMl gene in transformed plants can 
vary depending on the position of their insertion into 
the nuclear genome. Such position effects are well known 
in the art. For this reason, several nuclear 

3 0 transf ormants should be regenerated and tested for 
expression of the transgene. 

In some instances, it may be desirable to down- 
regulate or inhibit expression of endogenous DDMl in 
plants possessing the gene. One clear benefit to 

35 engineering a reduction of DDMl function is to reduce 
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gene (including transgene) silencing. Plant lines with 
reduced or absent DDMl function are expected to be viable 
based on results obtained with Arabidopsis , Further, it 
has been shown that gene silencing is suppressed in ddml 
Arabidopsis lines (Jeddeloh et al . , Genes De\rel . 12:1714- 
1725, 1998) . There are two other beneficial 
characteristics of DDMl deficient plant lines. First, 
alteration in DNA methylation leads to changes in 
flowering time, and as such, is a potentially powerful 
tool for manipulating plant development. (See, e.g., 
Richards, Trends in Genetics 13: 319-323, 1998), Second, 
ddml mutant lines exhibit inbreeding depression (a 
reduction in vigor after inbreeding) (Richards, Trends in 
Genetics, 1998, supra), a characteristic which may be 
desirable to include in situations where proprietary 
germplasms in hybrid plants are at risk of unauthorized 
use. For instance, a genetically engineered hybrid 
(containing one or more useful transgenes) could be 
further engineered to down- regulate endogenous DDMl 
expression. Unauthorized inbreeding of such lines would 
be discouraged because the progeny of such lines would 
lack vigor. * 

To achieve the aforementioned benefits 
associated with reduced gene expression, DDMl nucleic 
acid molecules, or fragments thereof, may also be 
utilized to control the production of DDMI-encoded 
proteins. In one embodiment, full-length DDMl antisense 
molecules or antisense oligonucleotides, targeted to 
specific regions of DDMI-encoded RNA that are critical 
for translation, are used. The use of antisense 
molecules to decrease expression levels of a pre- 
determined gene is known in the art. In a preferred 
embodiment, antisense molecules are provided in situ by 
transforming plant cells with a DNA construct which, upon 
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transcription, produces the antisense sequences. Such 
constructs can be designed to produce full-length or 
partial antisense sequences. 

In another embodiment, overexpression of DDMl 
5 is induced to generate a co-suppression effect. This 
excess expression serves to promote down- regulation of 
both endogenous and exogenous DDMl genes . 

Optionally, transgenic plants can be created 
containing mutations in the region encoding the active 

10 site of DDMl, .This embodiment may be preferred in 
certain instances . 

From the foregoing discussion, it can be seen 
that DDMl and its homologs will be useful for introducing 
alterations in gene expression in an organism, for a 

15 variety of purposes. As described above, for instance, 
the Arahidopsis DDMl gene can be used to isolate mutants 
or engineer organisms that express reduced function of 
DDMl orthologs. Based on results in Arethidopsis , such 
mutants or engineered organisms are expected to be viable 

2 0 and display valuable characteristics, such as inbreeding 
depression and a reduction in gene silencing. In 
addition, we anticipate that dysfunction in human DDMl 
orthologs may contribute to diseases that involve 
alterations in DNA methylation, including cancer (Baylin, 

25 S.B. et al . , Adv. Cancer Res, 72: 141-196, 1998) and 

immunodeficiency/ chromosome instability/facial anomalies 
syndrome (ICF) (Smeets, D.F.C.M. et al . , Hum. Genet. 94: 
240-246, 1994) . 



3 0 2 • DDMl Proteins and Antibodies 

Purified DDMI-encoded proteins, or fragments 
thereof, may be used to produce polyclonal or monoclonal 
antibodies which also may serve as sensitive detection 
reagents for the presence and accumulation of DDMI-encoded 
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protein in cultured cells or tissues and in intact 
organisms. Recombinant techniques enable expression of 
fusion proteins containing part or all of the DDMl- 
encoded protein. The full length protein or fragments of 
5 the protein may be used to advantage to generate an array 
-of monoclonal or polyclonal antibodies specific for 
various epitopes of the protein, thereby providing even 
greater sensitivity for detection of the protein in cells 
or tissue. 

10 DDMl gene products also may be useful as 

pharmaceutical agents if it is determined that DDMl loss 
of function plays a role in carcinogenesis, as mentioned 
above. The gene products could be administered as 
replacement therapy for persons having neoplasias 
15 associated with DDMl loss of function. 

Polyclonal or monoclonal antibodies 
immunologically specific for DDm-encoded proteins may be 
used in a variety of assays designed to detect and 
quant itate the protein. Such assays include, but are not 
20 limited to: (1) flow cytometric analysis; (2) 

immunochemical localization in cultured cells or tissues; 
and (3) immunoblot analysis (e.g., dot blot, Western 
blot) of extracts from various cells and tissues. 

Polyclonal or monoclonal antibodies that 
25 immunospecif ically interact with the polypeptide encoded 
by DDMl can be utilized for identifying and purifying 
such proteins. For example, antibodies may be utilized 
for affinity separation of proteins with which they 
immunospecif ically interact. Antibodies may also be used 
to immunoprecipitate proteins from a sample containing a 
mixture of proteins and other biological molecules. 



30 



The following specific examples are provided to 
illustrate embodiments of the invention. They are not 
3 5 intended to limit the scope of the invention in any way. 
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EXAMPLE 1 

5 Map-Based Isolation of the 

ArabldoiDsis thallana DPMI Gene 

Construction of recombination breakpoint lines. 

The recombination breakpoint lines were assembled in the 
10 F3 generation from a parental cross between YI DDMl 
ABA/YI ddml-2 ABA (Columbia strain (Col)) and 
yi DDMl aba/yi DDMl aha (Landsberg erecta strain 
(La er) ) . The recessive yi mutation leads to a yellow- 
inflorescence. The recessive aha mutation causes a defect 
15 in abscisic acid biosynthesis and a wilting phenotype . 
Information on genetic markers and the A. thaliana 
genetic map can be found at: http : //genome- 
www . Stanford . edu/Arabidopsis/ . Selfed seeds from Fl 
YI ddml-2 ABA/yi DDMl aha plants were collected and 13 5 
20 F2 recombinants (yi ABA, yellow inflorescence, non- 
wilting; or YI aba: green inflorescence, wilting) were 
identified. Selfed seeds from 111 of the 135 recombinant 
F2 individuals were planted to generate F3 tissue for 
genomic DNA preparation. The genotype at the DDMl locus 

2 5 was scored in the F3 generation by Southern blot analysis 

using methylat ion-sensitive endonucleases as described 
previously (Vongs, A., Kakutani, T., Martienssen, R.A. & 
Richards, E.J. , Science 260: 1926-1928, 1993). 

Molecular markers. Two of the molecular 

3 0 markers shown in Figure 1 were available from the 

Arahidopsis research community: g4510 {Arahidopsis 
Biological Resource Center (ABRC) stock# CD2-38) and 
mi33 5 (ABRC stock# CD3-288) . The remainder of the 
molecular markers shown in Figure 1 were developed in 
3 5 accordance with the present invention. sT10D21Bam is an 
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insert end subclone of the BAG (bacterial artificial 
chromosome) clone T10D21 constructed by complete cleavage 
wich BanMl and recircularizat ion . sT10D21Bam recognizes a 
Col/La er PstI RFLP (restriction fragment length 
polymorphism) . Molecular marker A is an Xbal Col /La er 
RFLP marker recognized by a 5 . 7 kb Hindlll fragment of 
the C38 cosmid insert. Marker B is 'a Rsal Col/La er CAPS 
marker (Koneiczny 8c Ausubel , Plant J. 4: 403-410, 1993) 
(forward primer: 5 ' - TCAAGGAGATGATTCGGGCGT - 3 ' , SEQ ID NO: 
6; reverse primer: 5 ' -AAAGGACCCATTTACAGAACAC- 3 ' , SEQ ID 
NO:7). The remaining markers, C and D, correspond to 
RFLP's (Bell and PstI, respectively) recognized by the 
succinate dehydrogenase cDNA clone, 105N23T7 (ABRC stock# 
105N23T7) . 

Genomic library construction and screening. We 

screened the available A. thaliana BAG genomic libraries 
by standard colony hybridization techniques using 
radiolabeled 105N23T7 insert as a probe. The clone we 
subsequently focused upon, T10D21, came from the Texas 
AScM University BAG library (Choi et al . , Weeds World 2: 
17-20, 1995). To facilitate subsequent analysis, we 
cloned SauBAI partially digested fragments from the 
T10D21 insert into the BamRl site of SuperCos 
(Stratagene) . We chose to further characterize one 
member of the resulting cosmid sublibrary, C38, which 
contained genetic markers that flanked ddml-2. The C3 8 
cosmid was submitted on April 20, 1999, under the 
provisions of the Budapest Treaty, with the American Type 
Culture Collection (Manassas VA) , and assigned ATCC 
Accession No. 207208. 
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DDMl Gene Structure and Identification; 
Sequence Determination of DPMI Gene 

5 DNA sequence determination. C3 8 cosmid (-4 5 kb) 

DNA, prepared using Qiagen columns and protocols, was 
sonicated and 1-2 kb fragments isolated from a low- 
melting temperature agarose gel. The size-selected DNA 
was cloned into the S277al site of a M13mpl8 vector to 

10 generate a shotgun library suitable for DNA sequence 

determination. Single-stranded substrates were prepared 
and sequenced using conventional dye- terminator cycle 
sequencing protocols ( Perkin-Elmer ) on either an ABI 373 
or ABI 3 77 automated DNA sequencer. The DNA sequence of 

15 the ddml alleles was determined using PCR-amplif ied 
templates and oligonucleotide primers dispersed 
throughout the DDMl gene. Sequence assembly and analysis 
were accomplished using Phred/Phrap/Consed 
(http://www.mbt.washington.edu/) and DNASTAR software 

20 suites. 

RT-PCR cDNA analysis, DDMl gene structure was 
determined by analysis of the genomic DNA sequence and 
the nucleotide sequence of RT-PCR (reverse transcription- 
polymerase chain reaction) products encompassing the 

2 5 coding region. DDMl and ddml -2 transcripts were analyzed 

by RT-PCR as follows. Total RNA was prepared using the 
Qiagen RNeasy"" protocol. Poly (A) + transcripts were 
collected on oligo-d(T)25 magnetic Dynabeads (Dynal) and 
first -strand cDNA synthesis performed following Dynal 
30 protocols using Stratascript (Stratagene) reverse 

transcriptase. Aliquots of the bead- immobilized first- 
strand cDNA library were used as templates for PCR 
amplification using KlenTaqI polymerase (Clontech) . The 
following oligonucleotide primers were used for the RT- 

3 5 PCR experiment shown in Fig. 2b: forward, 

5 ' -GCTGGAAGGGAAAGCTTAACAACC- 3 ' ( SEQ ID NO : 8 ) ; reverse , 
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5 ' -ACACTGCCATCGATTCTGCAAACC-3 ' (SEQ ID NO : 9 ) . 

GenBank accession m-tmbers and SEQ ID NOS • 

Arahldopsis DDMl genomic DNA sequence: SEQ ID NO : 1 ; 
Ajrahidopsis DDMl deduced amino acid sequence: SEQ ID NO : 2 ; 
5 Arabidopsis DDMl 5* upstream genomic DNA sequence: SEQ ID 
NO : 3 ; 

Mus musculus lymphocyte specific helicase (LSH) ; Genbank 
Accession No. AAB08015; SEQ ID NO : 4 ; 

Ho;770 sapiens SNF2h; Genbank Accession No. ABO 1088 2 ; SEQ 
10 ID NO:5; 

succinate dehydrogenase cDNA 105N23T7, T22529; 
primers: SEQ ID NOS: 6-9. 

While certain of the preferred embodiments of 
15 the present invention have been described and 

specifically exemplified above, it is not intended that 
the invention be limited to such embodiments. Various 
modifications may be made thereto without departing from 
the scope and spirit of the present invention, as set 
20 forth in the following claims. 
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SEQUENCE LISTING 

<110> Eric J. Richards 

Jeffrey A. Jeddeloh 

<12 0> Plant Gene that Regulates DNA 
Methylat ion 

<130> WashU CI-0014PCT 

<150> US 60/ 

<151> 1998-04-30 

<150> US 09/104,070 
<151> 1998-06-24 

<160> 9 

<170> FastSEQ for Windows Version 3.0 

<210> 1 

<211> 5000 

<212> DNA 

<213> Arabidopsis thaliana 

<220> 
<221> CDS 

<222> (535) . . . (566) 
<221> CDS 

<222> (772) . . . (850) 
<221> CDS 

<222> (986) . . . (1252) 
<221> CDS 

<222> (1354) . . . (1440) 
<221> CDS 

<222> (1549) . . . (1895) 
<221> CDS 

<222> (1976) . . . (2165) 
<221> CDS 

<222> (2251) . . . (2426) 
<221> CDS 

<222> (2559) . . . (2625) 
<221> CDS 

<222> (2703) . . . (2892) 
<221> CDS 

<222> (2975) . . . (3070) 
<221> CDS 

<222> (3148) . . . (3242) 
<221> CDS 

<222> (3317) . . . (3436) 
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<221> 
<222> 


CDS 
(3540) 


. . . (3659) 












<221> 
<222> 


CDS 
(3745) 


. . . (3843) 












<221> 
<222> 


CDS 
(3934 ) 


. . . (4038) 












<221> 
<222> 


CDS 
(4130) 


. . . (4354) 












<221> 
<222> 
<223> 


gene 
(535) . 
/ gene= 


. . (4354) 
"DDMl" 












<221> 
<222> 
<223> 
replace with 


mutation 
(785) . . , (785) 
/note= "site of 
82 bp" 


ddml - 5 


{ som8 ) 


mutation ; 


delete 


G at 7 85 and 


<221> 
<222> 
<223> 


mutation 
(2384 ) ... (2385) 
/note= "site of 


ddml-6 


( som4 ) 


mutation ; 


delete 


G at 2384 or 



2385 



<221> misc^feature 
<222> (3186) . . . (3186) 

<223> /note= "alternate splice donor site used in ddml-2 

<221> mutation 

<222> (3243) . . . (3243) 

<223> /note= "site of ddml-2 mutation; G to A" 

<221> mutation 

<222> (3337) . . , (3337) 

<223> /note= "site of ddml-7 (som5) mutation; G to A" 
<221> tRNA 

<222> (4755) . . . (4826) 

<223> /note= "complement of predicted tRNA-glu" 



<400> 1 

tgatcatttt cttcctccgg ccaatttgca gatcgaaaaa tgatttagct ttttattaaa 60 

aatattgtta ttcgttttta gccgatatca taactttttg agatacatta tcaacacact 120 

cgtgcaactg agatattctt gacacaattt ttgcatttga aattggcaat tttgtactac 180 

tcatatagtt tgaagcttca attcactaca aaggttatta ctaattgtgt cgacaaatcc 240 

agcagattta ataatgccca ttccattaaa tgttttttag tttaataata ggatgatcat 300 

atgaccaaaa tcgtaaataa gggttagggg taaacctgtc atttcaagct tcccgcccat 360 

gggcgctact cccaatttaa taaaaaataa gaaaataggc gtaaatatga gagtgtgttt 420 

tttcaatata ccctcggttt tgaatttgct ctcaaaagcg acggagacga ctgtttggct 480 

cggtgatttc tcccgccgtt tgggtttttc ttaccggaat ttccttctcc ttcgatggtt 540 

agtctgcgct ccagaaaagt tattccgtaa gtccctccac ctttcctttt catttcgtta 600 

tttccggcga ttttctaggt ccttaacgct ctcgaaatcg ctcgctgttc ttggtggttt 660 

ttggttccct ctctgcgtaa ttttgtttgt cgtgtttttg gattatattc tctgactatt 720 

ggtctcactg ttgatttatc atttctcgat tttggatttt tggactctta gggcttcgga 780 

aatggtcagc gacgggaaaa cggagaaaga tgcgtctggt gattcaccca cttctgttct 840 

caacgaagag gtttgttcta tgttctacta ttttgccttc gtagtgtggt tgctttgtga 900 

aactttgtgt gttactcttt gtttctttaa atctggggtg ttctgtaaat gggtcctttt 960 

tggtcctttt tttctgaatg tgaaggaaaa ctgtgaggag aaaagtgtta ctgttgtaga 1020 

ggaagagata cttctagcca aaaatggaga ttcttctctt atttctgaag ccatggctca 1080 

ggaggaagag cagctgctca aacttcggga agatgaagag aaagctaaca atgctggatc 1140 
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tgctgttgct cctaatctga atgaaactca gtttactaaa cttgatgagc tcttgacgca 1200 

aactcagctc tactctgagt ttctccttga gaaaatggag gatatcacaa ttgtaatctt 1260 

ctttatttct ttcttctttg tggtttctca cttttcgaat gggagtcatt attcttagtt 1320 

tgaacaactt gtgggtgaaa tttgttttgc tagaatggga tagaaagtga gagccaaaaa 13 8 0 

gctgagcccg agaagactgg tcgtggacgc aaaagaaagg ctgcttctca gtacaacaat 1440 

gttggttcca tttatataat tttcaactac tatgcatgat cttgtatata ttgttttttc 1500 

tgcttgtttg agaaagtaac ttacttggat gcttttttct tcaatcagac taaggctaag 1560 

agagcggttg ctgctatgat ttcaagatct aaagaagatg gtgagaccat caactcagat 162 0 

ctgacagagg aagaaacagt catcaaactg cagaatgaac tttgtcctct tcccactggt 1680 

ggacagttaa agtcttatca gcttaaaggt gtcaaatggc taatatcatt gtggcagaat 1740 

ggtttgaatg gaatattagc tgatcaaatg ggacttggaa agacgattca aacgatcggt 1800 

ttcttatcac atctgaaagg gaatgggttg gatggtccat atctagtcat tgctccactg 1860 

tctacacttt caaattggtt caatgagatt gctaggtact ctcatggcca tatgtgtttg 1920 

tatagatcca atgctttggg gtttctgttg aaagttttct taccttttcc attaggttca 1980 

cgccttccat caatgcaatc atctaccatg gggataaaaa tcaaagggat gagctcagga 2 04 0 

ggaagcacat gcctaaaact gttggtccca agttccctat agttattact tcttatgagg 2100 

ttgccatgaa tgatgctaaa agaattctgc ggcactatcc atggaaatat gttgtgattg 2160 

atgaggtaaa ttccgagatt ggtcaatgta ctaggctttg aagatcaaga tgatctctct 2220 

aactgataat tttgttcttg tatattatag. ggecacaggt tgaaaaacca caagtgtaaa 2280 

ttgttgaggg aactaaaaca cttgaagatg gataacaaac ttctgctgac aggaacacct 2340 

ctgcaaaata atctttctga gctttggtct ttgttaaatt ttattctgcc tgacatcttt 2400 

acatcacatg atgaatttga atcatggtac aaacatggtc cttttctact attatccita 2460 

actagtcttc tttttttttt tttttttgtt aacactggtg gcagcttttt gacatttatt 2520 

cctttcttag tatctaactg atagatgagt ctctacaggt ttgatttttc tgaaaagaac 2580 

aaaaacgaag caaccaagga agaagaagag aaaagaagag ctcaagtatg tacaattata 2 64 0 

tcaattttcc tttatttctt tgattgtatt tatgtcttat gctaagggta catcttgtct 2700 

aggttgtttc caaacttcat ggtatactac gaccattcat ccttcgaaga atgaaatgtg 2760 

atgttgagct ctcacttcca cggaaaaagg agattataat gtatgctaca atgactgatc 2820 

atcagaaaaa gttccaggaa catctggtga ataacacgtt ggaagcacat cttggagaga 2880 

atgccatccg aggtacatga tctatttttt ttttttaata ctttgtttaa ttatgtcatt 2940 

ttctgcattg atttgttcat cccctatact tcaggtcaag gctggaaggg aaagcttaac 3000 

aacctggtca ttcaacttcg aaagaactgc aaccatcctg accttctcca ggggcaaata 3060 

gatggttcat gtatgtcagt ttcttttaag aaacgtaaga aaaacttctg tcatactgtt 3120 

ctgtctaatt gtttcatttc gtgacagatc tctaccctcc tgttgaagag attgttggac 3180 

agtgtggtaa attccgctta ttggagagat tacttgttcg gttatttgcc aataatcaca 324 0 

aagtatgttt cacaaaccca tggctcgtag ctcatttccc tttgagaact tctctgatcc 3300 

atttgctgat gaccaggtcc ttatcttctc ccaatggacg aaacttttgg acattatgga 3360 

ttactacttc agtgagaagg ggtttgaggt ttgcagaatc gatggcagtg tgaagctgga 342 0 

tgaaaggaga agacaggttt cacctgtgct tatgctgctt ttgcgttgct tttaagcaat 3480 

attctgacca aatattataa ccataaggtc tctctctctc tctctttgcc ttgaaacaga 3540 

ttaaagattt cagtgatgag aagagcagct gtagtatatt tctcctgagt accagagctg 3600 

gaggactcgg aatcaatctt actgctgctg atacatgcat cctctatgac agcgactggg 3660 

taatcaaatc aattaattta ttttctttga aggaaaatct ttctctttcg tgttgtctcc 3720 

aactgtgttt tgtctgatct ccagaaccct caaatggact tgcaagccat ggacagatgc 3780 

cacagaatcg ggcagacgaa acctgttcat gtttataggc tttccacggc tcagtcgata 3 84 0 

gaggtaaaac "tctttgttgt tcatatcaat . caatcttaac ttcaaaccat tgagattgtt 3900 

gcctcatgag attggtttat gacatttgct cagacccggg ttctgaaacg agcgtacagt 3 960 

aagctcaagc tggaacatgt ggttattggc caagggcagt ttcatcaaga acgtgccaag 4 02 0 

tcttcaacac ctttagaggt tttaacttct cttaaagctc aatccttttt agatacactt 4080 

attatcaaca aaatctccta ttgacagctt gaaccaaact aacacacagg aagaggacat 414 0 

actggcgttg cttaaggaag atgaaactgc tgaagataag ttgatacaaa ccgatataag 42 00 

cgatgcggat cttgacaggt tacttgaccg gagtgacctg acaattactg caccgggaga 42 6 0 

gacacaagct gctgaagctt ttccagtgaa gggtccaggt tgggaagtgg tcctgcctag 432 0 

ttcgggagga atgctgtctt ccctgaacag ttaggacaca ttaataagcc aggccttgaa 4380 

accacttctg tgtttttttt ttttttttcc ggaacatgat cggttacttt tggctgggag 4440 

gatttaatta ttagagggct cggaagtttt tgtaagttaa agaactcact taaaaccctg 4500 

aaaacatgac agttaatggt gattagctct caatgtgatg aaaacaattg gccctctgat 4560 

tttgctgttg cggtaatatt atgacttgtg tacgtttata gtctttgtag tctgcaattt 4620 

tggcattgag ctatttctca cgaacttatg ggatcttatg ttttggattt gggatttgtt 4680 

aacttatatg attaggctca atagtttcac agaatattaa aaacttgagt agggtttaaa 4740 

aaagaagcaa aaagctccga tgccgggaat cgaacccggg tctcctgggt gaaagccaga 4800 

tatcctaacc gctggacgac atcggatttg ttgatgtcta ttcttgtaaa tagtaaatat 4860 
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ttagttttat cggttttgca tctaatggac taaaacatga acacgagacg ccgacaagaa 4920 

tgaatggggc aggcaccaaa catttgggta aaagtatgca gtggggtatt attgacaatt 4980 

tgaccatcac aagagctaat 5000 

,<210> 2 
<211> 764 
<212> PRT 

<213> Arabidopsis thaliana 
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<400 
tgtcgaagtt 
tgaaaacctt 
gacaaagcta 
aacagaaaat 
aggcattcca 
tccggtcgat 
catgtttcaa 
ctcagctcaa 
ctcttcttca 
accaaacttt 
caaaagaaaa 



> 3 
tccatggaag 
tgttacagat 
tcacttcagc 
ttcaaactca 
ggaagattcc 
gagacgcttg 
agatactaaa 
tctagggttt 
gctaccagtc 
tctgataact 
agaaaaaaaa 



attgtgacca 
ttcgcaaacg 
gtctggatct 
aaaaacagaa 
gtttcttctt 
catcgccgga 
ttccaatctt 
atcatcctcc 
aatctgcttt 
cactctctga 
tcaaaacttc 



cgacgatgaa 
aatcgattcg 
gaatttagac 
aaaaaaaagt 
cccgacggat 
aactgtagag 
tgaacacaaa 
tcctactctg 
tcgtaaaaat 
cctctcttct 
attacccaag 



gctgaagatt 
ttgccataag 
aatcagtgag 
ttggattttt 
ttaggagtta 
gaattatcta 
aaggaagaag 
tttagtctct 
ctccttttcc 
tcaaaaagat 
aaatctctta 



ctggtcacgt 
tgttttaggt 
aacaactaaa 
gagaagtacc 
gattttggtt 
aatcaaccgg 
caaatctcag 
ctttctctct 
cctttccgcc 
ttaaaacccc 
atcatttaac 
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360 
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480 
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ccagactctt 
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cctttatgga 
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ctctacatgt 
attataaatt 
tgtcaaaaga 
taaattctta 
tagacgggga 
attttgttgt 
cgaaatcgca 
ccaatttaat 
ttcagctttt 
cacgtgtctt 
ttgaaaagta 
ttcgttctta 
cgattactat 
aaattttgta 
caagcattga 
ttttaggctg 
acatatggtc 
ctttcttttt 
tagttatgac 
attcaaatcc 
gtccaaccat 
caaataacca 
ttcacgttgt 
aggacaaggc 
acacattttt 
taaatggttc 
ttgtaggaaa 
cctagagaaa 
tgatggaatg 
gaagaagagg 
aactagaaaa 
tgaccactga 
ggatccatgc 
aacgtcgtat 
acttctcgga 
aaatcaaagg 
gaaacaaatt 
tgaatcacag 
aatagaacaa 
aagctgtgtc 
ccggaggaga 
acgtggatga 
aacgagctcc 
gtcggtcact 
atccttatga 
ttcttttatg 
gttgttaatc 
tcatcaactt 
ggcgaaatac 
cgtgaacagc 
agatgattcg 
aaaggatcca 



tcttctccac 
accggagacg 
aaaaaataag 
ttctttggct 
atactaattt 
tacataattg 
caaataatta 
ttaaaagcac 
aaagattata 
gcagtaccca 
tacgtaatca 
tattttatgt 
tatatccgtg 
get tatgaac 
caaaggaggg 
tgaacgaata 
atttgttttc 
gcaatgttcg 
gttttaacgt 
ttgttatctg 
accagcttgg 
gttaaaataa 
aagaattaaa 
aaaagcgaat 
aattaattaa 
agatgggtcc 
caaatagcag 
ttttcctgat 
ttataactaa 
gaacaacgta 
gctttatggg 
catattgtat 
tttcgaaatt 
tgactctggt 
ttgtatagta 
aaattgtttg 
caaattaata 
atggtggcaa 
actggtttac 
gcaaacccca 
acatgtgcat 
gcgcaagtgt 
attctatata 
atatataggg 
aaatttagaa 
agaaccacaa 
ttggaatcat 
tagtgctgac 
aagagaaaca 
cgaacataga 
tgttttccgg 
aagctcaaac 
ggttccttct 
ctgttcataa 
aaacataaca 
atcaaaaccc 
catgtgttgt 
cagtttcatt 
attgtacagc 
atgtgcgtca 
ggcgttaatc 
gatctttggt 



acgcatcttt 
agttatcctt 
gcggcttgtg 
tgtccaagaa 
ttttcttttc 
tgtcgacttt 
aaatagaata 
caaacaaaac 
tcatagacga 
atgaatgcaa 
aacgatcaag 
cattgtttac 
tttggtttaa 
atgtcaataa 
tattaccgtc 
acatttttta 
ttctttgtgg 
tctgtttttg 
tttcattgcc 
aatttgcatc 
atttctgtgt 
aaaaggttta 
atttatcttt 
gttacatata 
tttgggaccc 
ctccataaac 
ttccaaccac 
aatcttgtat 
atactttgac 
atgatctttt 
cttttctgtt 
gaatgtaatt 
gctattgcgt 
ttttaatgag 
tagtatgagt 
tctaaaaatg 
tttgttgcga 
aagaggcaaa 
caatattaca 
agaccacgca 
ttatcttttt 
tatagtattt 
actatgtcca 
attgttgtca 
ctaattaaga 
atataaattg 
caattatacg 
aatcttcgaa 
gaggaacaga 
caaagacgat 
tatgggcaac 
gtctgataaa 
ctacctcgtt 
gcccgtcaaa 
aagatgttga 
aaaaaagtca 
actctgcaag 
gtttttttgc 
agtacatagc 
cgggacaagt 
taaagagcaa 
gtttggagct 



tatccaccgt 
actacttccg 
tgtgagactt 
aaaggagcct 
aacttttcac 
caagttccaa 
atctttttgt 
gagtaaatag 
tgtacacaga 
tatcaggttt 
taatttatta 
tatatagatt 
gattgggttt 
acaaaaaaat 
gcgttgtcgg 
ctgtgggaat 
gtgtatattt 
ttgactttga 
ttgtaggcat 
cgttggataa 
atatgttaca 
atttatgagt 
gcttagtaat 
tgtccattga 
ctttttttgt 
tcactattct 
tagtatccaa 
ttgtttgttc 
tcacttgatc 
tgggccgagt 
tatttatgca 
actatgattt 
gtgatatctg 
tagtccccat 
ttttatttga 
cacacatgaa 
aaataatgtt 
gactaaacta 
gtatattgta 
aatcagtcta 
ccatcattcg 
tattattatc 
ccatcttact 
cgaatacaat 
gtggaactaa 
gaagacctta 
aaaaaaagaa 
ccatttgtgg 
aagaatagaa 
ggtctggaga 
aacgttgcac 
tggtcgcaac 
ggctcgcctc 
gattgctcca 
aaatatgatt 
ttaccctgct 
tctgcattac 
ttatgaatta 
agcgacggga 
gaagatgacg 
cgacgaaatg 
cgtcgtctcc 



ccaccgatct 
gcttgtttct 
tgtgtgaaag 
tcttcttctt 
cctttttttt 
gtatctaaat 
agattttaaa 
atattgtaat 
tgaaaattag 
gtattatttt 
atattgtcga 
ttgagctaaa 
tagtatttcc 
tattttactg 
accgtaaaat 
ttgtcgtgta 
ctggttaacg 
cccttttttg 
ctgagaagct 
acatgacgct 
ccgccacttc 
aaaagtatgt 
ttgcacttaa 
aaaaattgca 
tagtttcaaa 
gccagcatac 
taataatctg 
aatgagctta 
cgtacacatt 
tatttgtatt 
tgtaaagttt 
aagggcactg 
tgttggacca 
gggagttatg 
tatcttttat 
tatcttgtgg 
attattttat 
atgaatttaa 
attttataaa 
caaatatgaa 
gatttttaca 
caatattaat 
tgtgtctatg 
gctaattaag 
aatgccaatg 
aaaaacaatt 
gaaagaaaaa 
gtttcataca 
ggagtgggaa 
cggtgttgga 
ttaggtggca 
cgcttatcgc 
ttatacctct 
ttgtaagtca 
cctctttttt 
tcgtaagtat 
attattcatc 
cgattgcagc 
ggaccacagg 
gcgtcggagt 
ggtggtttcg 
ggttgcaaag 



gatccaacgg 
ctctgaagaa 
cttcaacctt 
ttctctctct 
ttgttaacaa 
ctgtattttg 
ttgaaaacgg 
aattttttca 
aaaatggcat 
tctattgtat 
tggcgtagaa 
cgacttattt 
aatattaatc 
tcactgtcct 
aattaaccaa 
gcattacgtt 
aaactataac 
gtaatattcg 
cagattctga 
gacaggtgga 
ccttaatttc 
aaaacgacaa 
gattggattc 
tttgacttta 
ggaagaatta 
aaattcctta 
aacaaattat 
atacgtatat 
gatttcgttt 
ctcaacctga 
ataatgcttg 
cttttctgtt 
attattgaaa 
ttcatttacc 
cttcggaaaa 
tctcacacaa 
catacgaaat 
aatatgaaaa 
aacgaatcct 
aatttccaat 
atggaaattt 
atcattattc 
ttgcaacttc 
gaagattgtg 
aaaatagcct 
aaacgaggac 
agaggtttca 
atcgatcacc 
gtgtatgagg 
agttccgata 
aaatatgatg 
cgctcgtatc 
ccaggttcaa 
ttcaaaatca 
ttttcttttt 
tcaacataaa 
gtacacagag 
aagcttcaac 
cgttaaacgc 
ttcatcaagg 
ttttatggca 
tggatatgtg 
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900 
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1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
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1620 
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1800 
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1920 
1980 
2 04 0 
2100 
2160 
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2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2820 
2880 
2940 
3000 
3060 
3120 
3180 
3240 
3300 
3360 
3420 
3480 
3540 
3600 
3660 
3720 
3780 
3840 
3900 
3960 
4020 
4080 
4140 
4200 
4260 
4320 
4380 
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gaagcaacgg tcggctttca tggcgacatt cctctaacca gcaaactccg gcgtctacsg 4440 

gaacgccaar acctctccgc cggtttwtac aggtccaatc cggttattga tttttttttk 4500 

gatgtaatgt ccggttctca aaatgttgaa ccggtggttt atttattgtt tggagcaggg 4560 

gttaratcct cgttcgacgg cgaatctgtt tcttgacgca aacgtgtatc ggagagaaga 4 620 

taatcaacgg tgaggattgc tttatcttga aactggagac gagtccggcg gttcgagaag 4680 

ctcaaagcgg tccgaatttt gagataattc atcacacgat atggggttat tttagtcaaa 4740 

gatcgggact tttgattcag ttcgaagatt cgcggctttt gagaatgagg accaaggaag 4800 

acgaagatgt cttctgggag actagtgctg agtcggtgat ggatgattac cgatacgttg 4860 

acaatgtgaa catcgctcac ggcgggaaaa catcggtcac ggttttccgg tacggtgaag 4920 

cgtcggcgaa tcatcggaga cagatgacgg agaagtggag gatagaagaa gttgatttta 4980 
atgtttgggg tctctccgtt 
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Phe Lys He Asp Glu Glu Leu Val Thr Asn Ser Gly Lys Phe Leu He 
355 

Leu Asp Arg Met Leu Pro Glu Leu Lys Lys Arg Gly His Lys Val Leu 

val Phe Ser Gin Met Thr Ser Met Leu Asp He Leu Met Asp- Tyr Cys 

His Leu Arg Asn Phe He Phe Ser Arg Leu Asp Gly Ser Met Ser ijr 

410 

Ser Glu Arg Glu Lys Asn lie Tyr Ser Phe Asn Thr Asp Pro Asp Val 

Phe Leu Phe Leu Val Ser Thr Arg Ala Gly Gly Leu Gly ^^e Asn Leu 

Thr Ala Ala Asp Thr Val He He Tyr Asp Ser Asp Jrp Asn Pro Gin 

455 460 

ser Asp Leu Gin Ala Gin Asp Arg Cys His Arg He Gly Gin Thr Lys 

470 - — 



475 



Pro Val val Val Tyr Arg Leu Val Thr Ala Isn Thr He Asp Gin Jys 

He Val Glu Arg Ala Ala Ala Lys Arg Lys Leu Glu Lys Leu He He 

500 505 510 

His Lys Asn His Phe Lys Gly Gly Gin Ser Gly Leu Ser Gin Ser Lys 

515 520 525 

Asn Phe Leu Asp Ala Lys Glu Leu Met Glu Leu Leu Lys Ser Arg Asp 

530 535 y V 

Tyr Glu Arg Glu Val Lys Gly Ser Arg Glu Lys Val He Ser Asp Glu 

550 555 

Asp Leu Glu Leu Leu Leu Asp Arg Ser Asp Leu He Asp Gin Met lJ2 

570 

Ala ser Arg Pro He Lys Gly Lys Thr Gly He Phe Lys He Leu Glu 

585 590 
Asn Ser Glu Asp Ser Ser Ala Glu Cys Leu Phe 
595 600 

<210> 5 
<211> 1052 
<212> PRT 

<213> Homo sapiens 
<400> 5 

Met Ser Ser Ala Ala Glu Pro Pro Pro Pro Pro Pro Pro Glu Ser Ala 

^ 10 15 

Pro Ser Lys Pro Ala Ala Ser He Ala Ser Gly Gly Ser Asn Ser Ser 

20 25 30 

Asn Lys Gly Gly Pro Glu Gly Val Ala Ala Gin Ala Val Ala Ser Ala 

35 40 45 

Ala Ser Ala Gly Pro Ala Asp Ala Glu Met Glu Glu He Phe Asp Asp 

50 55 60 

Ala Ser Pro Gly Lys Gin Lys Glu He Gin Glu Pro Asp Pro Thr Tyr 
^5 70 75 80 

Glu Glu Lys Met Gin Thr Asp Arg Ala Asn Arg Phe Glu Tyr Leu Leu 

85 90 95 

Lys Gin Thr Glu Leu Phe Ala His Phe He Gin Pro Ala Ala Gin Lys 

.100 105 
Thr Pro Thr Ser Pro Leu Lys Met Lys Pro Gly Arg Pro Arg He Lys 

115 120 125 

Lys Asp Glu Lys Gin Asn Leu Leu Ser Val Gly Asp Tyr Arg His Arg 
130 135 

Arg Thr Glu Gin Glu Glu Asp Glu Glu Leu Leu Thr Glu Ser Ser Lys 
145 150 155 

Ala Thr Asn Val Cys Thr Arg Phe Glu Asp Ser Pro Ser Tyr Val Lys 

165 170 175 

Trp Gly Lys Leu Arg Asp Tyr Gin Val Arg Gly Leu Asn Trp Leu He 
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680 685 
Asn Glu Lys Leu Ser Lys Met Gly Glu Ser Ser Leu Arg Asn Phe Thr 

695 700 
Met Asp Thr Glu Ser Ser Val Tyr Asn Phe Glu Gly Glu Asp Tyr Ara 

'710 715 720 

Glu Lys Gin Lys He Ala Phe Thr Glu Trp He Glu Pro Pro Lys Arg 

725 730 735 

Glu Arg Lys Ala Asn Tyr Ala Val Asp Ala Tyr Phe Arg Glu Ala Leu 

740 745 
Arg Val Ser Glu Pro Lys Ala Pro Lys Ala Pro Arg Pro Pro Lys Gin 

760 765 
Pro Asn Val Gin Asp Phe Gin Phe Phe Pro Pro Arg Leu Phe Glu Leu 

770 775 780 

Leu Glu Lys Glu He Leu Phe Tyr Arg Lys Thr He Gly Tyr Lys Val 

I . . 800 
Pro Arg Asn Pro Glu Leu Pro Asn Ala Ala Gin Ala Gin Lys Glu Glu 

805 810 815 

Gin Leu Lys He Asp Glu Ala Glu Ser Leu Asn Asp Glu Glu Leu Glu 

^20 825 830 

Glu Lys Glu Lys Leu Leu Thr Gin Gly - Phe Thr Asn Trp Asn Lys Arg 

835 840 845 

Asp Phe Asn Gin Phe He Lys Ala Asn Glu Lys Trp Gly Arg Asp Asp 

855 860 
He, Glu Asn He Ala Arg Glu Val Glu Gly Lys Thr Pro Glu Glu Val 

370 875 880 

He Glu Tyr Ser Ala "Val Phe Trp Glu Arg Cys Asn Glu Leu Gin Asp 

885 890 895 

He Glu Lys He Met Ala Gin He Glu Arg Gly Glu Ala Arg He Gin 

900 905 910 

Arg Arg He Ser He Lys Lys Ala Leu Asp Thr Lys He Gly Arg Tyr 

915 920 925 

Lys Ala Pro Phe His Gin Leu Arg He Ser Tyr Gly Thr Asn Lys Gly 

930 935 940 

Lys Asn Tyr Thr Glu Glu Glu Asp Arg Phe Leu He Cys Met Leu His 

950 955 960 

Lys Leu Gly Phe Asp Lys Glu Asn Val Tyr Asp Glu Leu Arg Gin Cys 

965 970 975 

He Arg Asn Ser Pro Gin Phe Arg Phe Asp Trp Phe Leu Lys Ser Arg 

980 985 990 

Thr Ala Met Glu Leu Gin Arg Arg Cys Asn Thr Leu He Thr Leu He 

995 1000 1005 

Glu Arg Glu Asn Met Glu Leu Glu Glu Lys Glu Lys Ala Glu Lys Lys 

1010 1015 1020 

Lys Arg Gly Pro Lys Pro Ser Thr Gin Lys Arg Lys Met Asp Gly Ala 
1025 1030 1035 1040 

Pro Asp Gly Arg Gly Arg Lys Lys Lys Leu Lys Leu 
1045 1050 

<210> 6 
<211> 21 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> /note= "synthetic construct" 
<400> 6 

tcaaggagat gattcgggcg t 21 

f 

<210> 7 
<211> 22 
<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> /note= "synthetic construct" 

<400> 7 
aaaggaccca tttacagaac ac 

<210> 8 
<211> 24 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> /note= "synthetic construct" 

<400> 8 
gctggaaggg aaagcttaac aacc 

<210> 9 
<211> 24 
<212> DNA. 

<213> Artificial Sequence 
<220> 

<223> /note= "synthetic construct" 

<400> 9 
acactgccat cgattctgca aacc 
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1. An isolated nucleic acid molecule 
comprising a gene located on Arahidopsis thaiiana 
chromosome- 5, lower arm, said gene occupying a segment of 
said chromosome 5, lower arm, flanked on the centromeric 
side within 20 kilobases by a gene encoding a zinc-finger 
protein and on the telomeric side within 1 kilobase by a 
gene encoding a glutamic acid tRNA, the disruption of 
said gene being associated with DNA hypomethylation . 

2. The nucleic acid molecule of claim 1, 
wherein said gene is composed of exons that form an open 
reading frame having a sequence that encodes a 
polypeptide about 750-850 amino acids in length. 

3 . A cDNA molecule comprising the exons of the 
nucleic acid molecule of claim 2, 

4. The nucleic acid molecule of claim 2, 
wherein said open reading frame encodes an amino acid 
sequence substantially the same as SEQ ID NO : 2 . 

5. The nucleic acid molecule of claim 4, 
wherein said open reading frame encodes amino acid SEQ ID 
NO : 2 . 

6. The nucleic acid molecule of claim 5, which 
comprises an open reading frame of SEQ ID NO:l. 

7. A recombinant DNA molecule, comprising a 
vector having an insert that includes the nucleic acid 
molecule of claim 1. 



8. The recombinant DNA molecule of claim 7, 
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which is cosmid C38, ATCC Accession No. 207208. 

9. An oligonucleotide between about 10 and 100 
nucleotides in length, which specifically hybridizes with 

5 a portion of the nucleic acid molecule of claim 1. 

10. An isolated nucleic acid molecule which is 
a gene, the disruption of which is associated with DNA 
hypomethylat ion, having a sequence selected from the 

10 group consisting of: 

a) SEQ ID NO: 1 ; 

b) an allelic variant or natural mutant of 

SEQ ID NO:l; 

c) a sequence hybridizing with part or 
15 all" of SEQ ID NO : 1 or its complement and encoding a 

polypeptide substantially the same as part or all of a 
polypeptide encoded by SEQ ID NO:l; 

d) a sequence encoding part or all of a . 
polypeptide having amino acid SEQ ID NO: 2; and 

e) a sequence encoding part or all of a 
polypeptide contained in the cosmid clone C38, designated 
ATCC Accession No. 207208. 



20 



11. A polypeptide produced by expression of an 
25 isolated nucleic acid molecule comprising part or all of 
an open reading frame of a gene located on Arabidopsis 
tha.lla.na chromosome 5, lower arm, said gene occupying a 
segment of said chromosome 5, lower arm, flanked on the 
centromeric side within 20 kilobases by a gene encoding a 
30 zinc- finger protein and on the telomeric side within 1 
kilobase by a gene encoding a glutamic acid tRNA, the 
disruption of said gene being associated with DNA 
hypome t hy 1 a t i on . 



35 12. The polypeptide of claim 11, produced by 
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expression of a sequence selected from the group 
consisting of: 

a) SEQ ID NO: 1; 

b) an allelic variant or natural mutant of 

SEQ ID NO: 1; 

c) a sequence hybridizing with part or 
all of SEQ ID NO:l or its complement and encoding a 
polypeptide substantially the same as part or all of a 
polypeptide encoded by SEQ ID NO:l; 

d) a sequence encoding part or all of a 
polypeptide having amino acid SEQ ID NO : 2 ; and 

e) a sequence encoding part or all of a 
polypeptide contained in the clone designated ATCC 
Accession No. 207208. 

13. The polypeptide of claim 11, having the 
amino acid sequence of part or all of SEQ ID NO: 2. 

14. An antibody immunologically specific for 
20 the polypeptide of claim 11. 

15. An isolated nucleic acid molecule having a 
sequence substantially the same as SEQ ID NO: 3. 

1^- An isolated protein encoded by an 
Arahidopsis thaliana gene, said protein being a member of 
an SWI2/SNF2 family of polypeptides, loss of function of 
said protein being associated with DNA hypomethylation . 



15 



30 



35 



17. The protein of claim 16, encoded by a gene 
located on A. thaliana chromosome 5, lower arm, 
centromerically flanked within 20 kilobases by a zinc- 
finger-encoding gene and telomerically within one 
kilobase by a gene encoding a glutamic acid tRNA. 
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18. The protein of claim 16, encoded by a DNA 
segment on a recombinant cosmid C3 8, having ATCC 
Accession No. 207208. 



19. The protein of claim 16, having amino acid 
SEQ ID NO : 2 . 

20. A transgenic organism comprising the 
10 nucleic acid molecule of claim l. 

21. The transgenic organism of claim 20, which 
is a plant. 

22. A method of stabilizing fidelity of DNA 
methylation in an organism, comprising transforming the 
organism with the nucleic acid molecule of claim 1. 

23 . A method of reducing or eliminating gene 
20 silencing in a plant, comprising inhibiting or preventing 

expression of an endogenous DDMl gene of the plant. 

24,. A method of introducing inbreeding 
depression in a plant, comprising inhibiting or 
25 preventing expression of an endogenous DDMl gene of the 
plant. 
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